[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/10572 @jinxing64 Yup but I think I intended to say most cases within Spark datasources are covered by it. Empty files could be skipped by `spark.hadoopRDD.ignoreEmptySplits` and probably most other cases could be covered by `InputFormat` control though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/10572 @HyukjinKwon To merge small files, should I tune `spark.sql.files.maxPartitionBytes`? But IIUC it only works for `FileSourceScanExec`. So when I select from hive table, it doesn't work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/10572 Is that https://github.com/apache/spark/pull/12095? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...
Github user cerisier commented on the issue: https://github.com/apache/spark/pull/10572 @davies do you have the commit that fixes this in 2.0 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...
Github user davies commented on the issue: https://github.com/apache/spark/pull/10572 This is fixed in 2.0, could you close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org