Thanks Ted.
I see createDirectStream is experimental as annotated with
"org.apache.spark.annotation.Experimental".
Is it possible to be this API will be removed in future? because we wanted
to use this API in one of our production jobs. afraid if it will not be
supported in future.
Thank you,
Another question is how to install graphframes permanently when the spark nodes
can not connect to the internet.
-- Original --
From: Denny Lee
Date: Mon,Feb 19,2018 10:23 AM
To: xiaobo
Cc: user@spark.apache.org
Hi,
To represent a bidirectional relationship, one solution is to insert two edges
for the vertices pair, my question is do the algorithms of graphframes still
work when we doing this.
Thanks
Note the --packages option works for both PySpark and Spark (Scala). For
the SparkLauncher class, you should be able to include packages ala:
spark.addSparkArg("--packages", "graphframes:0.5.0-spark2.0-s_2.11")
On Sun, Feb 18, 2018 at 3:30 PM xiaobo wrote:
> Hi Denny,
>
Say I have a main method with the following pseudo-code (to be run on a spark
standalone cluster):
main(args) {
RDD rdd
rdd1 = rdd.map(...)
// some other statements not using RDD
rdd2 = rdd.filter(...)
}
When executed, will each of the two statements involving RDDs (map and
filter) be
createStream() is still
in
external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
But it is not
in
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaUtils.scala
FYI
On Sun, Feb 18, 2018 at 5:17 PM, naresh Goud
Hello Team,
I see "KafkaUtils.createStream() " method not available in spark 2.2.1.
Can someone please confirm if these methods are removed?
below is my pom.xml entries.
2.11.8
2.11
org.apache.spark
spark-streaming_${scala.tools.version}
2.2.1
provided
Hi Denny,
The pyspark script uses the --packages option to load graphframe library, what
about the SparkLauncher class?
-- Original --
From: Denny Lee
Date: Sun,Feb 18,2018 11:07 AM
To: 94035420
Cc:
Hi,
I have a job which is spending approx 30% time in GC. When I looked at the
logs it seems like GC is triggering before the spill happens. I wanted to
know if there is a config setting which I can use to force spark to spill
early, maybe when memory is 60-70% full.
Thanks
Nikhil
Hi - I’m maintaining it. As of now there is an issue with 2.2 that breaks
personalized page rank, and that’s largely the reason there isn’t a release for
2.2 support.
There are attempts to address this issue - if you are interested we would love
for your help.
Hi,
I am trying to apply a spark random forest on a stream with Python . I
couldn't find a lot on this subject on the net.
Is there some example somewhere ?
I asked the question with my code, details and example and ressources on
stack overflow :
> Most likely not as most of the effort is currently on GraphFrames - a great
> blog post on the what GraphFrames offers can be found at: https://
Is the graphframes package still active ? The github repository
indicates it's not extremelly active. Right now, there is no available
package for
12 matches
Mail list logo