[ https://issues.apache.org/jira/browse/SPARK-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217262#comment-15217262 ]
Hyukjin Kwon commented on SPARK-14260: -------------------------------------- I am currently not sure if this value does not affect performance. Maybe I should investigate this and will close if it does matter. > Increase default value for maxCharsPerColumn > -------------------------------------------- > > Key: SPARK-14260 > URL: https://issues.apache.org/jira/browse/SPARK-14260 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Hyukjin Kwon > Priority: Trivial > > I guess the default value of the option {{maxCharsPerColumn}} looks > relatively small,1000000 characters meaning 976KB. > It looks some of guys have a problem with this ending up setting the value > manually. > https://github.com/databricks/spark-csv/issues/295 > https://issues.apache.org/jira/browse/SPARK-14103 > According to [univocity > API|http://docs.univocity.com/parsers/2.0.0/com/univocity/parsers/common/CommonSettings.html#setMaxCharsPerColumn(int)], > this exists to avoid {{OutOfMemoryErrors}}. > If this does not harm performance, then I think it would be better to make > the default value much bigger (eg. 10MB or 100MB) so that users do not take > care of the lengths of each field in CSV file. > Apparently Apache CSV Parser does not have such limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org