Can Precompiled Stand Alone Python Application Submitted To A Spark Cluster?

2018-02-17 Thread xiaobo
Hi, To pretect the IP of our software distributed to customers, one solution is to use precompiled python scriptes, but we are wondering whether this is a supported feature by pyspark. Thanks.

Re: Does Pyspark Support Graphx?

2018-02-17 Thread Denny Lee
Most likely not as most of the effort is currently on GraphFrames - a great blog post on the what GraphFrames offers can be found at: https://databricks.com/blog/2016/03/03/introducing-graphframes.html. Is there a particular scenario or situation that you're addressing that requires GraphX vs.

Re: Does Pyspark Support Graphx?

2018-02-17 Thread xiaobo
Thanks Denny, will it be supported in the near future? -- Original -- From: Denny Lee Date: Sun,Feb 18,2018 11:05 AM To: 94035420 Cc: user@spark.apache.org Subject: Re: Does Pyspark Support

Re: Does Pyspark Support Graphx?

2018-02-17 Thread Denny Lee
That’s correct - you can use GraphFrames though as it does support PySpark. On Sat, Feb 17, 2018 at 17:36 94035420 wrote: > I can not find anything for graphx module in the python API document, does > it mean it is not supported yet? >

Does Pyspark Support Graphx?

2018-02-17 Thread 94035420
I can not find anything for graphx module in the python API document, does it mean it is not supported yet?

can we do self join on streaming dataset in 2.2.0?

2018-02-17 Thread kant kodali
Hi All, I know that stream to stream joins are not yet supported. From the text below I wonder if we can do self joins on the same streaming dataset/dataframe in 2.2.0 since there are no two explicit streaming datasets or dataframes? Thanks!! In Spark 2.3, we have added support for

Re: Can spark handle this scenario?

2018-02-17 Thread Lian Jiang
Thanks Anastasios. This link is helpful! On Sat, Feb 17, 2018 at 11:05 AM, Anastasios Zouzias wrote: > Hi Lian, > > The remaining problem is: > > > Spark need all classes used in the fn() serializable for t.rdd.map{ k=> > fn(k) } to work. This could be hard since some classes

Re: Can spark handle this scenario?

2018-02-17 Thread Anastasios Zouzias
Hi Lian, The remaining problem is: Spark need all classes used in the fn() serializable for t.rdd.map{ k=> fn(k) } to work. This could be hard since some classes in third party libraries are not serializable. This restricts the power of using spark to parallel an operation on multiple machines.

Re: Can spark handle this scenario?

2018-02-17 Thread Lian Jiang
Agreed. Thanks. On Sat, Feb 17, 2018 at 9:53 AM, Jörn Franke wrote: > You may want to think about separating the import step from the processing > step. It is not very economical to download all the data again every time > you want to calculate something. So download it

Re: Can spark handle this scenario?

2018-02-17 Thread Jörn Franke
You may want to think about separating the import step from the processing step. It is not very economical to download all the data again every time you want to calculate something. So download it first and store it on a distributed file system. Schedule to download newest information every

Re: Can spark handle this scenario?

2018-02-17 Thread Lian Jiang
*Snehasish,* I got this in spark-shell 2.11.8: case class My(name:String, age:Int) import spark.implicits._ val t = List(new My("lian", 20), new My("sh", 3)).toDS t.map{ k=> print(My) }(org.apache.spark.sql.Encoders.kryo[My.getClass]) :31: error: type getClass is not a member of object My

Re: Can spark handle this scenario?

2018-02-17 Thread SNEHASISH DUTTA
Hi Lian, This could be the solution case class Symbol(symbol: String, sector: String) case class Tick(symbol: String, sector: String, open: Double, close: Double) // symbolDS is Dataset[Symbol], pullSymbolFromYahoo returns Dataset[Tick] symbolDs.map { k =>