[ https://issues.apache.org/jira/browse/LIVY-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935852#comment-16935852 ]
Marco Gaido commented on LIVY-667: ---------------------------------- [~yihengw] yes, that's true. The point is that the data needs to be transferred anyway through JDBC. So having very very large datasets going over the wire may not very efficient. Moreover, in case you have a single very big partition, you can always repartition it and avoid the issue. My point here is that there may be workarounds for the use case and I don't expect this problem to be faced in usual use cases. So I feel an overkill to design some workarounds for a corner case. It is also doable to do the same which is suggested here manually: ie. create a table with the result of a query (this writes on HDFS) and then read the table... > Support query a lot of data. > ---------------------------- > > Key: LIVY-667 > URL: https://issues.apache.org/jira/browse/LIVY-667 > Project: Livy > Issue Type: Bug > Components: Thriftserver > Affects Versions: 0.6.0 > Reporter: runzhiwang > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > When enable livy.server.thrift.incrementalCollect, thrift use toLocalIterator > to load one partition at each time instead of the whole rdd to avoid > OutOfMemory. However, if the largest partition is too big, the OutOfMemory > still occurs. -- This message was sent by Atlassian Jira (v8.3.4#803005)