[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539559#comment-14539559 ] Apache Spark commented on SPARK-3928: - User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/5526 Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Assignee: Cheng Lian Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539563#comment-14539563 ] Cheng Lian commented on SPARK-3928: --- While introducing the first class partitioning support for file system based data sources, HDFS style globbing is added in a more general way. And Parquet is also being migrated to this implementation. Please refer to https://github.com/liancheng/spark/pull/6 for more details. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Assignee: Cheng Lian Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534582#comment-14534582 ] Thu Kyaw commented on SPARK-3928: - Hello [~lian cheng] please let me know if you want me to work on adding back the glob support. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Assignee: Cheng Lian Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536114#comment-14536114 ] Yana Kadiyska commented on SPARK-3928: -- [~tkyaw] Your suggested workaround does work. One question though -- what are the implications of turning off spark.sql.parquet.useDataSourceApi? My particular concern is with predicate pushdowns into parquet -- am I going to lose these (it's hard to tell from the UI if pushdown is happening correctly). Also, can you clarify if you still plan to fix this for 1.4 or New parquet implementation does not contain wild card support yet means that we'd have to live with spark.sql.parquet.useDataSourceApi until further time? Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Assignee: Cheng Lian Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536227#comment-14536227 ] Apache Spark commented on SPARK-3928: - User 'tkyaw' has created a pull request for this issue: https://github.com/apache/spark/pull/6025 Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Assignee: Cheng Lian Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533244#comment-14533244 ] Michael Armbrust commented on SPARK-3928: - Sorry for the confusion here. There have been a lot of improvements to this codepath, but unfortunately this feature was lost in the shuffle. Here are my thoughts: - Comma separated lists: were supported, will not be supported anymore. Use the varargs method to pass more than one file. This is because {{,}} is a valid character in a filename and so the old implementation was broken for some people. - Hadoop style globbing expressions: probably broken at the moment. should be supported. We will try to include them in Spark 1.4, time permitting. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Assignee: Cheng Lian Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533277#comment-14533277 ] Nicholas Chammas commented on SPARK-3928: - {quote} Comma separated lists: were supported, will not be supported anymore. Use the varargs method to pass more than one file. This is because {{,}} is a valid character in a filename and so the old implementation was broken for some people. {quote} Isn't this inconsistent with how {{textFile()}} works? {{textFile()}} allows you to pass a single, comma-delimited string of file paths. If we want to support files with commas in their name -- which sounds like a corner case -- shouldn't we instead offer some kind of escaping mechanism for commas? It would be more work for those who have commas in their files names, but that seems like a fair tradeoff. If you do weird things, then you should expect to do more work. The advantage for the rest of us is that we get a consistent way of globbing files across {{textFile()}} and {{parquetFile()}}. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Assignee: Cheng Lian Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533683#comment-14533683 ] Cheng Lian commented on SPARK-3928: --- Hey guys, sorry for all the trouble. We are adding back globbing expression back to Parquet in 1.4. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Assignee: Cheng Lian Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532663#comment-14532663 ] Marius Soutier commented on SPARK-3928: --- DataFrames now expect varagrs, i.e. df.parquetFile(/path/to/file/1,/path/to/file/2). Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532716#comment-14532716 ] Yana Kadiyska commented on SPARK-3928: -- Then I think they should change the resolved status to Resolved-Won't fix if that's a conscious decision. I did make an edit to my previous comment as to why I'd love this to be different. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532704#comment-14532704 ] Marius Soutier commented on SPARK-3928: --- Wildcards were never supported and it seems they don't intend to change that. :( Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532624#comment-14532624 ] Yana Kadiyska commented on SPARK-3928: -- I am observing the same issue. Downloaded a pre-built CDH4 1.3.1 distro. {quote} scala sc.textFile(/rum/warehouse/hive/pkey=-2015-04/-2015-04-serialno-750/*.parquet).first res0: String = PAR1? L??? ?p??? ,?�� ?? ???p?p??? ,�� ??�� ? ,?�� ?? ???p?p??? ,�� ??�� ? ,?�� ?? ???p?p??? ,�� ??�� ? ,?�� ?? ???p?p??? ,�� ??�� ? ,?��??? ???p?p??? ,��???���q?�qL?� �8��{???%??? ???/???(???�???�???9???�???�???2???#???M???0??? ???6???�???4???�???P???*??? ???�??? s scala hc.parquetFile(/rum/warehouse/hive/pkey=-2015-04/-2015-04-serialno-750/*.parquet) java.io.FileNotFoundException: File does not exist: hdfs://cdh4-21968-nn/rum/warehouse/hive/pkey=-2015-04/-2015-04-serialno-750/*.parquet {quote} Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532680#comment-14532680 ] Yana Kadiyska commented on SPARK-3928: -- Marius, are you saying that wildcards are not supported then? in my case, I would really like to do /r/warehouse/hive/pkey=-2015-04/* (which works w/ textFile method btw) -- i.e. pass a single path for all April 2015 partitions. Enumerating all paths underneath is pretty crazy, that's a huge list. Are you saying that is the only way? I thought the whole point of this bug is that we _don't_ have to enumerate the paths explicitly. Also in my case hc is a HiveContext instance, not a dataframe. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532918#comment-14532918 ] Nicholas Chammas commented on SPARK-3928: - [~yanakad] - Do you see this same issue with Spark 1.3.0? [~marmbrus] - This issue needs an assignee since it was resolved as fixed. (Though as you can see there are questions about that.) Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532922#comment-14532922 ] Nicholas Chammas commented on SPARK-3928: - it seems they don't intend to change that. Dunno on what basis you say that, but this is not true. As you can see in [the PR linked from this issue|https://github.com/apache/spark/pull/3407], this feature was added to Spark several months ago. Of course, there appear to be some issues with it, but they will probably be fixed as soon as we identify the root cause. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532969#comment-14532969 ] Marius Soutier commented on SPARK-3928: --- Because parquetFile now takes a varargs parameter which in turn is combined to a single path using mkString(,). This works just as before. The PR you link to still uses the old method with a single String parameter. It probably got lost in translation. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532978#comment-14532978 ] Nicholas Chammas commented on SPARK-3928: - Pinging [~lian cheng], who removed the test for Parquet glob support in [this commit|https://github.com/apache/spark/commit/ba19689fe77b90052b587640c9ff325c5a892c20#diff-f219bdbfc17f89b4e6a6b51c2dc1f92bL1023]. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395645#comment-14395645 ] Harut Martirosyan commented on SPARK-3928: -- This stopped working after refactoring/merge. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221415#comment-14221415 ] Apache Spark commented on SPARK-3928: - User 'tkyaw' has created a pull request for this issue: https://github.com/apache/spark/pull/3407 Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218287#comment-14218287 ] sam commented on SPARK-3928: CSV doesn't work for me. I'm using spark-submit and it appears to prepend hdfs://blarblar to the csv. E.g. /path/a,/path/b results in exception: Exception in thread main java.io.FileNotFoundException: File does not exist: hdfs://blarblar/path/a,path/b Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181185#comment-14181185 ] Marius Soutier commented on SPARK-3928: --- This would be more than nice. Currently, `parquetFile()` supports comma-separated input, but this fails when one of those inputs is not available. A wildcard should solve that and make the API more consistent with other input methods. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169580#comment-14169580 ] Nicholas Chammas commented on SPARK-3928: - cc [~marmbrus] Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org