Hi,
To pretect the IP of our software distributed to customers, one solution is to
use precompiled python scriptes, but we are wondering whether this is a
supported feature by pyspark.
Thanks.
Most likely not as most of the effort is currently on GraphFrames - a
great blog post on the what GraphFrames offers can be found at:
https://databricks.com/blog/2016/03/03/introducing-graphframes.html. Is
there a particular scenario or situation that you're addressing that
requires GraphX vs.
Thanks Denny, will it be supported in the near future?
-- Original --
From: Denny Lee
Date: Sun,Feb 18,2018 11:05 AM
To: 94035420
Cc: user@spark.apache.org
Subject: Re: Does Pyspark Support
That’s correct - you can use GraphFrames though as it does support PySpark.
On Sat, Feb 17, 2018 at 17:36 94035420 wrote:
> I can not find anything for graphx module in the python API document, does
> it mean it is not supported yet?
>
I can not find anything for graphx module in the python API document, does it
mean it is not supported yet?
Hi All,
I know that stream to stream joins are not yet supported. From the text
below I wonder if we can do self joins on the same streaming
dataset/dataframe in 2.2.0 since there are no two explicit streaming
datasets or dataframes?
Thanks!!
In Spark 2.3, we have added support for
Thanks Anastasios. This link is helpful!
On Sat, Feb 17, 2018 at 11:05 AM, Anastasios Zouzias
wrote:
> Hi Lian,
>
> The remaining problem is:
>
>
> Spark need all classes used in the fn() serializable for t.rdd.map{ k=>
> fn(k) } to work. This could be hard since some classes
Hi Lian,
The remaining problem is:
Spark need all classes used in the fn() serializable for t.rdd.map{ k=>
fn(k) } to work. This could be hard since some classes in third party
libraries are not serializable. This restricts the power of using spark to
parallel an operation on multiple machines.
Agreed. Thanks.
On Sat, Feb 17, 2018 at 9:53 AM, Jörn Franke wrote:
> You may want to think about separating the import step from the processing
> step. It is not very economical to download all the data again every time
> you want to calculate something. So download it
You may want to think about separating the import step from the processing
step. It is not very economical to download all the data again every time you
want to calculate something. So download it first and store it on a distributed
file system. Schedule to download newest information every
*Snehasish,*
I got this in spark-shell 2.11.8:
case class My(name:String, age:Int)
import spark.implicits._
val t = List(new My("lian", 20), new My("sh", 3)).toDS
t.map{ k=> print(My) }(org.apache.spark.sql.Encoders.kryo[My.getClass])
:31: error: type getClass is not a member of object My
Hi Lian,
This could be the solution
case class Symbol(symbol: String, sector: String)
case class Tick(symbol: String, sector: String, open: Double, close: Double)
// symbolDS is Dataset[Symbol], pullSymbolFromYahoo returns Dataset[Tick]
symbolDs.map { k =>
12 matches
Mail list logo