[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539559#comment-14539559
 ] 

Apache Spark commented on SPARK-3928:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/5526

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Assignee: Cheng Lian
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-12 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539563#comment-14539563
 ] 

Cheng Lian commented on SPARK-3928:
---

While introducing the first class partitioning support for file system based 
data sources, HDFS style globbing is added in a more general way. And Parquet 
is also being migrated to this implementation. Please refer to 
https://github.com/liancheng/spark/pull/6 for more details.

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Assignee: Cheng Lian
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-08 Thread Thu Kyaw (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534582#comment-14534582
 ] 

Thu Kyaw commented on SPARK-3928:
-

Hello [~lian cheng] please let me know if you want me to work on adding back 
the glob support.

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Assignee: Cheng Lian
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-08 Thread Yana Kadiyska (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536114#comment-14536114
 ] 

Yana Kadiyska commented on SPARK-3928:
--

[~tkyaw] Your suggested workaround does work. One question though -- what are 
the implications of turning off spark.sql.parquet.useDataSourceApi? My 
particular concern is with predicate pushdowns into parquet -- am I going to 
lose these (it's hard to tell from the UI if pushdown is happening correctly). 

Also, can you clarify if you still plan to fix this for 1.4 or New parquet 
implementation does not contain wild card support yet means that we'd have to 
live with spark.sql.parquet.useDataSourceApi until further time?

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Assignee: Cheng Lian
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536227#comment-14536227
 ] 

Apache Spark commented on SPARK-3928:
-

User 'tkyaw' has created a pull request for this issue:
https://github.com/apache/spark/pull/6025

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Assignee: Cheng Lian
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533244#comment-14533244
 ] 

Michael Armbrust commented on SPARK-3928:
-

Sorry for the confusion here.  There have been a lot of improvements to this 
codepath, but unfortunately this feature was lost in the shuffle.  Here are my 
thoughts:

 - Comma separated lists: were supported, will not be supported anymore.  Use 
the varargs method to pass more than one file.  This is because {{,}} is a 
valid character in a filename and so the old implementation was broken for some 
people.
 - Hadoop style globbing expressions: probably broken at the moment.  should be 
supported.  We will try to include them in Spark 1.4, time permitting.

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Assignee: Cheng Lian
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533277#comment-14533277
 ] 

Nicholas Chammas commented on SPARK-3928:
-

{quote}
Comma separated lists: were supported, will not be supported anymore. Use the 
varargs method to pass more than one file. This is because {{,}} is a valid 
character in a filename and so the old implementation was broken for some 
people.
{quote}

Isn't this inconsistent with how {{textFile()}} works? {{textFile()}} allows 
you to pass a single, comma-delimited string of file paths.

If we want to support files with commas in their name -- which sounds like a 
corner case -- shouldn't we instead offer some kind of escaping mechanism for 
commas?

It would be more work for those who have commas in their files names, but that 
seems like a fair tradeoff. If you do weird things, then you should expect to 
do more work.

The advantage for the rest of us is that we get a consistent way of globbing 
files across {{textFile()}} and {{parquetFile()}}.

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Assignee: Cheng Lian
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533683#comment-14533683
 ] 

Cheng Lian commented on SPARK-3928:
---

Hey guys, sorry for all the trouble. We are adding back globbing expression 
back to Parquet in 1.4.

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Assignee: Cheng Lian
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Marius Soutier (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532663#comment-14532663
 ] 

Marius Soutier commented on SPARK-3928:
---

DataFrames now expect varagrs, i.e. 
df.parquetFile(/path/to/file/1,/path/to/file/2).


 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Yana Kadiyska (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532716#comment-14532716
 ] 

Yana Kadiyska commented on SPARK-3928:
--

Then I think they should change the resolved status to Resolved-Won't fix if 
that's a conscious decision. I did make an edit to my previous comment as to 
why I'd love this to be different.

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Marius Soutier (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532704#comment-14532704
 ] 

Marius Soutier commented on SPARK-3928:
---

Wildcards were never supported and it seems they don't intend to change that. :(

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Yana Kadiyska (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532624#comment-14532624
 ] 

Yana Kadiyska commented on SPARK-3928:
--

I am observing the same issue.

Downloaded a pre-built CDH4 1.3.1 distro.

{quote}
scala 
sc.textFile(/rum/warehouse/hive/pkey=-2015-04/-2015-04-serialno-750/*.parquet).first
res0: String = PAR1? L??? ?p??? ,?�� ?? ???p?p??? 
,�� ??�� ? ,?�� ?? ???p?p??? ,�� ??�� ? ,?�� ?? 
???p?p??? ,�� ??�� ? ,?�� ?? ???p?p??? ,�� ??�� 
? ,?��??? ???p?p??? ,��???���q?�qL?� 
�8��{???%??? 
???/???(???�???�???9???�???�???2???#???M???0??? 
???6???�???4???�???P???*??? ???�???
s
scala 
hc.parquetFile(/rum/warehouse/hive/pkey=-2015-04/-2015-04-serialno-750/*.parquet)
java.io.FileNotFoundException: File does not exist: 
hdfs://cdh4-21968-nn/rum/warehouse/hive/pkey=-2015-04/-2015-04-serialno-750/*.parquet

{quote}

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Yana Kadiyska (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532680#comment-14532680
 ] 

Yana Kadiyska commented on SPARK-3928:
--

Marius, are you saying that wildcards are not supported then? in my case, I 
would really like to do /r/warehouse/hive/pkey=-2015-04/* (which works w/ 
textFile method btw) -- i.e. pass a single path for all April 2015 partitions. 
Enumerating all paths underneath is pretty crazy, that's a huge list.
 Are you saying that is the only way? I thought the whole point of this bug is 
that we _don't_ have to enumerate the paths explicitly. Also in my case hc is a 
HiveContext instance, not a dataframe.

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532918#comment-14532918
 ] 

Nicholas Chammas commented on SPARK-3928:
-

[~yanakad] - Do you see this same issue with Spark 1.3.0?

[~marmbrus] - This issue needs an assignee since it was resolved as fixed. 
(Though as you can see there are questions about that.)

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532922#comment-14532922
 ] 

Nicholas Chammas commented on SPARK-3928:
-

 it seems they don't intend to change that.

Dunno on what basis you say that, but this is not true. As you can see in [the 
PR linked from this issue|https://github.com/apache/spark/pull/3407], this 
feature was added to Spark several months ago.

Of course, there appear to be some issues with it, but they will probably be 
fixed as soon as we identify the root cause.

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Marius Soutier (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532969#comment-14532969
 ] 

Marius Soutier commented on SPARK-3928:
---

Because parquetFile now takes a varargs parameter which in turn is combined to 
a single path using mkString(,). This works just as before. The PR you link 
to still uses the old method with a single String parameter. It probably got 
lost in translation.



 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-05-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532978#comment-14532978
 ] 

Nicholas Chammas commented on SPARK-3928:
-

Pinging [~lian cheng], who removed the test for Parquet glob support in [this 
commit|https://github.com/apache/spark/commit/ba19689fe77b90052b587640c9ff325c5a892c20#diff-f219bdbfc17f89b4e6a6b51c2dc1f92bL1023].

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2015-04-04 Thread Harut Martirosyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395645#comment-14395645
 ] 

Harut Martirosyan commented on SPARK-3928:
--

This stopped working after refactoring/merge.

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.3.0


 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2014-11-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221415#comment-14221415
 ] 

Apache Spark commented on SPARK-3928:
-

User 'tkyaw' has created a pull request for this issue:
https://github.com/apache/spark/pull/3407

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor

 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2014-11-19 Thread sam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218287#comment-14218287
 ] 

sam commented on SPARK-3928:


CSV doesn't work for me. I'm using spark-submit and it appears to prepend 
hdfs://blarblar to the csv.  E.g. /path/a,/path/b results in exception:

Exception in thread main java.io.FileNotFoundException: File does not exist: 
hdfs://blarblar/path/a,path/b

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor

 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2014-10-23 Thread Marius Soutier (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181185#comment-14181185
 ] 

Marius Soutier commented on SPARK-3928:
---

This would be more than nice. Currently, `parquetFile()` supports 
comma-separated input, but this fails when one of those inputs is not 
available. A wildcard should solve that and make the API more consistent with 
other input methods.


 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor

 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2014-10-13 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169580#comment-14169580
 ] 

Nicholas Chammas commented on SPARK-3928:
-

cc [~marmbrus]

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor

 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org