Re: Does Pyspark Support Graphx?

2018-02-18 Thread Nicolas Paris
> Most likely not as most of the effort is currently on GraphFrames  - a great
> blog post on the what GraphFrames offers can be found at: https://

Is the graphframes package still active ? The github repository
indicates it's not extremelly active. Right now, there is no available
package for spark-2.2 so that one need to compile it from sources.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[Pyspark Streaming + ml] How to combine

2018-02-18 Thread Romain Jouin
Hi,

I am trying to apply a spark random forest on a stream with Python . I
couldn't find a lot on this subject on the net.

Is there some example somewhere ?

I asked the question with my code, details and example and ressources on
stack overflow :
https://stackoverflow.com/questions/48846882/pyspark-ml-streaming

Any help appreciated,
Thanks.
Romain.


Re: Does Pyspark Support Graphx?

2018-02-18 Thread Felix Cheung
Hi - I’m maintaining it. As of now there is an issue with 2.2 that breaks 
personalized page rank, and that’s largely the reason there isn’t a release for 
2.2 support.

There are attempts to address this issue - if you are interested we would love 
for your help.


From: Nicolas Paris 
Sent: Sunday, February 18, 2018 12:31:27 AM
To: Denny Lee
Cc: xiaobo; user@spark.apache.org
Subject: Re: Does Pyspark Support Graphx?

> Most likely not as most of the effort is currently on GraphFrames  - a great
> blog post on the what GraphFrames offers can be found at: https://

Is the graphframes package still active ? The github repository
indicates it's not extremelly active. Right now, there is no available
package for spark-2.2 so that one need to compile it from sources.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



GC issues with spark job

2018-02-18 Thread Nikhil Goyal
Hi,

I have a job which is spending approx 30% time in GC. When I looked at the
logs it seems like GC is triggering before the spill happens. I wanted to
know if there is a config setting which I can use to force spark to spill
early, maybe when memory is 60-70% full.

Thanks
Nikhil


Re: Does Pyspark Support Graphx?

2018-02-18 Thread xiaobo
Hi Denny,
The pyspark script uses the --packages option to load graphframe library, what 
about the SparkLauncher class? 




-- Original --
From: Denny Lee 
Date: Sun,Feb 18,2018 11:07 AM
To: 94035420 
Cc: user@spark.apache.org 
Subject: Re: Does Pyspark Support Graphx?



That??s correct - you can use GraphFrames though as it does support PySpark.  
On Sat, Feb 17, 2018 at 17:36 94035420  wrote:

I can not find anything for graphx module in the python API document, does it 
mean it is not supported yet?

KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud
Hello Team,

I see "KafkaUtils.createStream() " method not available in spark 2.2.1.

Can someone please confirm if these methods are removed?

below is my pom.xml entries.



  2.11.8
  2.11



  
  org.apache.spark
  spark-streaming_${scala.tools.version}
  2.2.1
  provided
  

  org.apache.spark
  spark-streaming-kafka-0-10_2.11
  2.2.1
  provided


  org.apache.spark
  spark-core_2.11
  2.2.1
  provided

  





Thank you,
Naresh


Re: KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread Ted Yu
createStream() is still
in 
external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
But it is not
in 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaUtils.scala

FYI

On Sun, Feb 18, 2018 at 5:17 PM, naresh Goud 
wrote:

> Hello Team,
>
> I see "KafkaUtils.createStream() " method not available in spark 2.2.1.
>
> Can someone please confirm if these methods are removed?
>
> below is my pom.xml entries.
>
>
> 
>   2.11.8
>   2.11
> 
>
>
>   
>   org.apache.spark
>   spark-streaming_${scala.tools.version}
>   2.2.1
>   provided
>   
> 
>   org.apache.spark
>   spark-streaming-kafka-0-10_2.11
>   2.2.1
>   provided
> 
> 
>   org.apache.spark
>   spark-core_2.11
>   2.2.1
>   provided
> 
>   
>
>
>
>
>
> Thank you,
> Naresh
>


[SparkQL] how are RDDs partitioned and distributed in a standalone cluster?

2018-02-18 Thread prabhastechie
Say I have a main method with the following pseudo-code (to be run on a spark
standalone cluster):
main(args) {
  RDD rdd
  rdd1 = rdd.map(...)
  // some other statements not using RDD
  rdd2 = rdd.filter(...)
}

When executed, will each of the two statements involving RDDs (map and
filter) be individually partitioned and distributed on available cluster
nodes? And any statements not involving RDDs (or data frames) will typically
be executed on the driver?
Is that how spark take advantage of the cluster?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Does Pyspark Support Graphx?

2018-02-18 Thread Denny Lee
Note the --packages option works for both PySpark and Spark (Scala).  For
the SparkLauncher class, you should be able to include packages ala:

spark.addSparkArg("--packages", "graphframes:0.5.0-spark2.0-s_2.11")


On Sun, Feb 18, 2018 at 3:30 PM xiaobo  wrote:

> Hi Denny,
> The pyspark script uses the --packages option to load graphframe library,
> what about the SparkLauncher class?
>
>
>
> -- Original --
> *From:* Denny Lee 
> *Date:* Sun,Feb 18,2018 11:07 AM
> *To:* 94035420 
> *Cc:* user@spark.apache.org 
> *Subject:* Re: Does Pyspark Support Graphx?
> That’s correct - you can use GraphFrames though as it does support
> PySpark.
> On Sat, Feb 17, 2018 at 17:36 94035420  wrote:
>
>> I can not find anything for graphx module in the python API document,
>> does it mean it is not supported yet?
>>
>


[graphframes]how Graphframes Deal With Bidirectional Relationships

2018-02-18 Thread xiaobo
Hi,
To represent a bidirectional relationship, one solution is to insert two edges 
for the vertices pair, my question is do the algorithms of graphframes still 
work when we doing this.


Thanks

Re: Does Pyspark Support Graphx?

2018-02-18 Thread xiaobo
Another question is how to install graphframes permanently when the spark nodes 
can not connect to the internet.




-- Original --
From: Denny Lee 
Date: Mon,Feb 19,2018 10:23 AM
To: xiaobo 
Cc: user@spark.apache.org 
Subject: Re: Does Pyspark Support Graphx?



Note the --packages option works for both PySpark and Spark (Scala).  For the 
SparkLauncher class, you should be able to include packages ala:

spark.addSparkArg("--packages", "graphframes:0.5.0-spark2.0-s_2.11")


On Sun, Feb 18, 2018 at 3:30 PM xiaobo  wrote:

Hi Denny,
The pyspark script uses the --packages option to load graphframe library, what 
about the SparkLauncher class? 




-- Original --
From: Denny Lee 
Date: Sun,Feb 18,2018 11:07 AM
To: 94035420 
Cc: user@spark.apache.org 



Subject: Re: Does Pyspark Support Graphx?



That??s correct - you can use GraphFrames though as it does support PySpark.  
On Sat, Feb 17, 2018 at 17:36 94035420  wrote:

I can not find anything for graphx module in the python API document, does it 
mean it is not supported yet?

Re: KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud
Thanks Ted.

I see  createDirectStream is experimental as annotated with
"org.apache.spark.annotation.Experimental".

Is it possible to be this API will be removed in future?  because we wanted
to use this API in one of our production jobs. afraid if it will not be
supported in future.

Thank you,
Naresh




On Sun, Feb 18, 2018 at 7:47 PM, Ted Yu  wrote:

> createStream() is still in external/kafka-0-8/src/main
> /scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
> But it is not in external/kafka-0-10/src/main/scala/org/apache/spark/strea
> ming/kafka010/KafkaUtils.scala
>
> FYI
>
> On Sun, Feb 18, 2018 at 5:17 PM, naresh Goud 
> wrote:
>
>> Hello Team,
>>
>> I see "KafkaUtils.createStream() " method not available in spark 2.2.1.
>>
>> Can someone please confirm if these methods are removed?
>>
>> below is my pom.xml entries.
>>
>>
>> 
>>   2.11.8
>>   2.11
>> 
>>
>>
>>   
>>   org.apache.spark
>>   spark-streaming_${scala.tools.version}
>>   2.2.1
>>   provided
>>   
>> 
>>   org.apache.spark
>>   spark-streaming-kafka-0-10_2.11
>>   2.2.1
>>   provided
>> 
>> 
>>   org.apache.spark
>>   spark-core_2.11
>>   2.2.1
>>   provided
>> 
>>   
>>
>>
>>
>>
>>
>> Thank you,
>> Naresh
>>
>
>