[jira] [Issue Comment Deleted] (SPARK-10658) Could pyspark provide addJars() as scala spark API?

2016-01-29 Thread Tony Cebzanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Cebzanov updated SPARK-10658:
--
Comment: was deleted

(was: I'm also noticing that the %addjar magic doesn't seem to work with 
PySpark (works fine in scala.)  Is that related to this issue?  If so, will 
resolving this issue also allow %addjar to work?)

> Could pyspark provide addJars() as scala spark API? 
> 
>
> Key: SPARK-10658
> URL: https://issues.apache.org/jira/browse/SPARK-10658
> Project: Spark
>  Issue Type: Wish
>  Components: PySpark
>Affects Versions: 1.3.1
> Environment: Linux ubuntu 14.01 LTS
>Reporter: ryanchou
>  Labels: features
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> My spark program was written by pyspark API , and it has used the spark-csv 
> jar library. 
> I could submit the task by spark-submit, and add `--jars` arguments for using 
> spark-csv jar library as following commands:
> ```
> /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar  xxx.py
> ```
> However I need to run my unittests like:
> ```
> py.test -vvs test_xxx.py
> ```
> It could't add jars by adding '--jars' arugment.
> Therefore I tried to use the SparkContext.addPyFile() API to add jars in my 
> test_xxx.py. 
> Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, 
> .py, .jar). 
> Does it mean that I could add *.jar (jar libraries) by using the addPyFile()?
> The codes which using addPyFile() to add jars as below: 
> ```
> self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar"))
> sqlContext = SQLContext(self.sc)
> self.dataframe = sqlContext.load(
> source="com.databricks.spark.csv",
> header="true",
> path="xxx.csv"
> )
> ```
> While it doesn't work. sqlContext cannot load the 
> source(com.databricks.spark.csv)
> Eventually I have found another way to set the enviroment variable 
> SPARK_CLASSPATH for loading jars libraries
> ```
> SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py
> ```
> It could load the jars libraries and sqlContext could load source succeed as 
> well as adding `--jar xxx1.jar` arguments
> For the situation on using third party jars (.py & .egg could work well by 
> using addPyFile()) in pyspark-written scripts.
> and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py).
> Have you ever planed to provide an API such as addJars() in scala for adding 
> jars to spark program, or was there a better way to add jars I still havent 
> found it yet?
> If someone want to addjars() in pyspark-written scripts not using '--jars'. 
> Could you give us some suggestions on it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10658) Could pyspark provide addJars() as scala spark API?

2016-01-29 Thread Tony Cebzanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124298#comment-15124298
 ] 

Tony Cebzanov commented on SPARK-10658:
---

I'm also noticing that the %addjar magic doesn't seem to work with PySpark 
(works fine in scala.)  Is that related to this issue?  If so, will resolving 
this issue also allow %addjar to work?

> Could pyspark provide addJars() as scala spark API? 
> 
>
> Key: SPARK-10658
> URL: https://issues.apache.org/jira/browse/SPARK-10658
> Project: Spark
>  Issue Type: Wish
>  Components: PySpark
>Affects Versions: 1.3.1
> Environment: Linux ubuntu 14.01 LTS
>Reporter: ryanchou
>  Labels: features
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> My spark program was written by pyspark API , and it has used the spark-csv 
> jar library. 
> I could submit the task by spark-submit, and add `--jars` arguments for using 
> spark-csv jar library as following commands:
> ```
> /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar  xxx.py
> ```
> However I need to run my unittests like:
> ```
> py.test -vvs test_xxx.py
> ```
> It could't add jars by adding '--jars' arugment.
> Therefore I tried to use the SparkContext.addPyFile() API to add jars in my 
> test_xxx.py. 
> Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, 
> .py, .jar). 
> Does it mean that I could add *.jar (jar libraries) by using the addPyFile()?
> The codes which using addPyFile() to add jars as below: 
> ```
> self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar"))
> sqlContext = SQLContext(self.sc)
> self.dataframe = sqlContext.load(
> source="com.databricks.spark.csv",
> header="true",
> path="xxx.csv"
> )
> ```
> While it doesn't work. sqlContext cannot load the 
> source(com.databricks.spark.csv)
> Eventually I have found another way to set the enviroment variable 
> SPARK_CLASSPATH for loading jars libraries
> ```
> SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py
> ```
> It could load the jars libraries and sqlContext could load source succeed as 
> well as adding `--jar xxx1.jar` arguments
> For the situation on using third party jars (.py & .egg could work well by 
> using addPyFile()) in pyspark-written scripts.
> and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py).
> Have you ever planed to provide an API such as addJars() in scala for adding 
> jars to spark program, or was there a better way to add jars I still havent 
> found it yet?
> If someone want to addjars() in pyspark-written scripts not using '--jars'. 
> Could you give us some suggestions on it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8655) DataFrameReader#option supports more than String as value

2015-10-29 Thread Tony Cebzanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980672#comment-14980672
 ] 

Tony Cebzanov commented on SPARK-8655:
--

I'm running into this limitation as well.

> DataFrameReader#option supports more than String as value
> -
>
> Key: SPARK-8655
> URL: https://issues.apache.org/jira/browse/SPARK-8655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Michael Nitschinger
>
> I'm working on a custom data source, porting it from 1.3 to 1.4.
> On 1.3 I could easily extend the SparkSQL imports and get access to it, which 
> meant I could use custom options right away. One of those is I pass a Filter 
> down to my Relation for tighter schema inference against a schemaless 
> database.
> So I would have something like:
> n1ql(filter: Filter = null, userSchema: StructType = null, bucketName: String 
> = null)
> Since I want to move my API behind the DataFrameReader, the SQLContext is not 
> available anymore, only through the RelationProvider, which I've implemented 
> and it works nicely.
> The only problem I have now is that while I can pass in custom options, they 
> are all String typed. So I have no way to pass down my optional Filter 
> anymore (since parameters is a Map[String, String]).
> Would it be possible to extend the options so that more than just Strings can 
> be passed in? Right now I probably need to work around that by documenting 
> how people can pass in a string which I turn into a Filter, but that's 
> somewhat hacky.
> Note that built-in impls like JSON or JDBC have no issues, because since they 
> can access the SQLContext (private) without issues, they don't need to go 
> through the decoupling of the RelationProvider and can do any custom 
> arguments they want on their methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org