[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...

2017-11-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/10572
  
@jinxing64 Yup but I think I intended to say most cases within Spark 
datasources are covered by it. Empty files could be skipped by 
`spark.hadoopRDD.ignoreEmptySplits` and probably most other cases could be 
covered by `InputFormat` control though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...

2017-11-19 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/10572
  
@HyukjinKwon 
To merge small files, should I tune `spark.sql.files.maxPartitionBytes`? 
But IIUC it only works for `FileSourceScanExec`. So when I select from hive 
table, it doesn't work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...

2016-08-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/10572
  
Is that https://github.com/apache/spark/pull/12095?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...

2016-08-27 Thread cerisier
Github user cerisier commented on the issue:

https://github.com/apache/spark/pull/10572
  
@davies do you have the commit that fixes this in 2.0 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...

2016-06-06 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/10572
  
This is fixed in 2.0, could you close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org