Re: KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud
Thanks Ted. I see createDirectStream is experimental as annotated with "org.apache.spark.annotation.Experimental". Is it possible to be this API will be removed in future? because we wanted to use this API in one of our production jobs. afraid if it will not be supported in future. Thank you,

Re: Does Pyspark Support Graphx?

2018-02-18 Thread xiaobo
Another question is how to install graphframes permanently when the spark nodes can not connect to the internet. -- Original -- From: Denny Lee Date: Mon,Feb 19,2018 10:23 AM To: xiaobo Cc: user@spark.apache.org

[graphframes]how Graphframes Deal With Bidirectional Relationships

2018-02-18 Thread xiaobo
Hi, To represent a bidirectional relationship, one solution is to insert two edges for the vertices pair, my question is do the algorithms of graphframes still work when we doing this. Thanks

Re: Does Pyspark Support Graphx?

2018-02-18 Thread Denny Lee
Note the --packages option works for both PySpark and Spark (Scala). For the SparkLauncher class, you should be able to include packages ala: spark.addSparkArg("--packages", "graphframes:0.5.0-spark2.0-s_2.11") On Sun, Feb 18, 2018 at 3:30 PM xiaobo wrote: > Hi Denny, >

[SparkQL] how are RDDs partitioned and distributed in a standalone cluster?

2018-02-18 Thread prabhastechie
Say I have a main method with the following pseudo-code (to be run on a spark standalone cluster): main(args) { RDD rdd rdd1 = rdd.map(...) // some other statements not using RDD rdd2 = rdd.filter(...) } When executed, will each of the two statements involving RDDs (map and filter) be

Re: KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread Ted Yu
createStream() is still in external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala But it is not in external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaUtils.scala FYI On Sun, Feb 18, 2018 at 5:17 PM, naresh Goud

KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud
Hello Team, I see "KafkaUtils.createStream() " method not available in spark 2.2.1. Can someone please confirm if these methods are removed? below is my pom.xml entries. 2.11.8 2.11 org.apache.spark spark-streaming_${scala.tools.version} 2.2.1 provided

Re: Does Pyspark Support Graphx?

2018-02-18 Thread xiaobo
Hi Denny, The pyspark script uses the --packages option to load graphframe library, what about the SparkLauncher class? -- Original -- From: Denny Lee Date: Sun,Feb 18,2018 11:07 AM To: 94035420 Cc:

GC issues with spark job

2018-02-18 Thread Nikhil Goyal
Hi, I have a job which is spending approx 30% time in GC. When I looked at the logs it seems like GC is triggering before the spill happens. I wanted to know if there is a config setting which I can use to force spark to spill early, maybe when memory is 60-70% full. Thanks Nikhil

Re: Does Pyspark Support Graphx?

2018-02-18 Thread Felix Cheung
Hi - I’m maintaining it. As of now there is an issue with 2.2 that breaks personalized page rank, and that’s largely the reason there isn’t a release for 2.2 support. There are attempts to address this issue - if you are interested we would love for your help.

[Pyspark Streaming + ml] How to combine

2018-02-18 Thread Romain Jouin
Hi, I am trying to apply a spark random forest on a stream with Python . I couldn't find a lot on this subject on the net. Is there some example somewhere ? I asked the question with my code, details and example and ressources on stack overflow :

Re: Does Pyspark Support Graphx?

2018-02-18 Thread Nicolas Paris
> Most likely not as most of the effort is currently on GraphFrames  - a great > blog post on the what GraphFrames offers can be found at: https:// Is the graphframes package still active ? The github repository indicates it's not extremelly active. Right now, there is no available package for