[ 
https://issues.apache.org/jira/browse/SPARK-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217262#comment-15217262
 ] 

Hyukjin Kwon commented on SPARK-14260:
--------------------------------------

I am currently not sure if this value does not affect performance. Maybe I 
should investigate this and will close if it does matter.

> Increase default value for maxCharsPerColumn
> --------------------------------------------
>
>                 Key: SPARK-14260
>                 URL: https://issues.apache.org/jira/browse/SPARK-14260
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Hyukjin Kwon
>            Priority: Trivial
>
> I guess the default value of the option {{maxCharsPerColumn}} looks 
> relatively small,1000000 characters meaning 976KB.
> It looks some of guys have a problem with this ending up setting the value 
> manually.
> https://github.com/databricks/spark-csv/issues/295
> https://issues.apache.org/jira/browse/SPARK-14103
> According to [univocity 
> API|http://docs.univocity.com/parsers/2.0.0/com/univocity/parsers/common/CommonSettings.html#setMaxCharsPerColumn(int)],
>  this exists to avoid {{OutOfMemoryErrors}}.
> If this does not harm performance, then I think it would be better to make 
> the default value much bigger (eg. 10MB or 100MB) so that users do not take 
> care of the lengths of each field in CSV file.
> Apparently Apache CSV Parser does not have such limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to