Why is Spark getting Kafka data out from port 2181 ?

2016-09-10 Thread Eric Ho
truststore in my Spark config ? Do I just give -D flags via JAVA_OPTS ? Thx -- -eric ho

how to pass trustStore path into pyspark ?

2016-09-02 Thread Eric Ho
I'm trying to pass a trustStore pathname into pyspark. What env variable and/or config file or script I need to change to do this ? I've tried setting JAVA_OPTS env var but to no avail... any pointer much appreciated... thx -- -eric ho

Re: how should I compose keyStore and trustStore if Spark needs to talk to Kafka & Cassandra ?

2016-09-01 Thread Eric Ho
I'm interested in what I should put into the trustStore file, not just for Spark but also for Kafka and Cassandra sides.. The way I generated self-signed certs for Kafka and Cassandra sides are slightly different... On Thu, Sep 1, 2016 at 1:09 AM, Eric Ho <e...@analyticsmd.com>

how should I compose keyStore and trustStore if Spark needs to talk to Kafka & Cassandra ?

2016-09-01 Thread Eric Ho
A working example would be great... Thx -- -eric ho

KeyManager exception in Spark 1.6.2

2016-08-31 Thread Eric Ho
scala:1106)* *at org.apache.spark.deploy.master.Master.main(Master.scala)* = -- -eric ho

Spark to Kafka communication encrypted ?

2016-08-31 Thread Eric Ho
I can't find in Spark 1.6.2's docs in how to turn encryption on for Spark to Kafka communication ... I think that the Spark docs only tells you how to turn on encryption for inter Spark node communications .. Am I wrong ? Thanks. -- -eric ho

Do we still need to use Kryo serializer in Spark 1.6.2 ?

2016-08-22 Thread Eric Ho
I heard that Kryo will get phased out at some point but not sure which Spark release. I'm using PySpark, does anyone has any docs on how to call / use Kryo Serializer in PySpark ? Thanks. -- -eric ho

Re: How to do nested for-each loops across RDDs ?

2016-08-15 Thread Eric Ho
to handle what you're asking about. > > I would personally use something like CoGroup or Join between the two > RDDs. if index matters, you can use ZipWithIndex on both before you join > and then see which indexes match up. > > On Mon, Aug 15, 2016 at 1:15 PM Eric Ho <e...@analyt

How to do nested for-each loops across RDDs ?

2016-08-15 Thread Eric Ho
contain elements in array B as well as array A. Same argument for RRD(B). Any pointers much appreciated. Thanks. -- -eric ho

how to do nested loops over 2 arrays but use Two RDDs instead ?

2016-08-15 Thread Eric Ho
any RDD functions that would do this for me efficiently. I don't really want elements of RDD(A) and RDD(B) flying all over the network piecemeal... THanks. -- -eric ho

com.datastax.spark % spark-streaming_2.10 % 1.1.0 in my build.sbt ??

2015-05-04 Thread Eric Ho
Can I specify this in my build file ? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/com-datastax-spark-spark-streaming-2-10-1-1-0-in-my-build-sbt-tp22758.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

No logs from my cluster / worker ... (running DSE 4.6.1)

2015-05-04 Thread Eric Ho
I'm submitting this via 'dse spark-submit' but somehow, I don't see any loggings in my cluster or worker machines... How can I find out ? My cluster is running DSE 4.6.1 with Spark enabled. My source is running Kafka 0.8.2.0 I'm launching my program on one of my DSE machines. Any insights much