Re: [Structured spak streaming] How does cassandra connector readstream deals with deleted record

2020-06-26 Thread Russell Spitzer
The connector uses Java driver cql request under the hood which means it
responds to the changing database like a normal application would. This
means retries may result in a different set of data than the original
request if the underlying database changed.

On Fri, Jun 26, 2020, 9:42 PM Jungtaek Lim 
wrote:

> I'm not sure how it is implemented, but in general I wouldn't expect such
> behavior on the connectors which read from non-streaming fashion storages.
> The query result may depend on "when" the records are fetched.
>
> If you need to reflect the changes in your query you'll probably want to
> find a way to retrieve "change logs" from your external storage (or how
> your system/product can also produce change logs if your external storage
> doesn't support it), and adopt it to your query. There's a keyword you can
> google to read further, "Change Data Capture".
>
> Otherwise, you can apply the traditional approach, run a batch query
> periodically and replace entire outputs.
>
> On Thu, Jun 25, 2020 at 1:26 PM Rahul Kumar 
> wrote:
>
>> Hello everyone,
>>
>> I was wondering, how Cassandra spark connector deals with deleted/updated
>> record while readstream operation. If the record was already fetched in
>> spark memory, and it got updated or deleted in database, does it get
>> reflected in streaming join?
>>
>> Thanks,
>> Rahul
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: [Structured spak streaming] How does cassandra connector readstream deals with deleted record

2020-06-26 Thread Jungtaek Lim
I'm not sure how it is implemented, but in general I wouldn't expect such
behavior on the connectors which read from non-streaming fashion storages.
The query result may depend on "when" the records are fetched.

If you need to reflect the changes in your query you'll probably want to
find a way to retrieve "change logs" from your external storage (or how
your system/product can also produce change logs if your external storage
doesn't support it), and adopt it to your query. There's a keyword you can
google to read further, "Change Data Capture".

Otherwise, you can apply the traditional approach, run a batch query
periodically and replace entire outputs.

On Thu, Jun 25, 2020 at 1:26 PM Rahul Kumar  wrote:

> Hello everyone,
>
> I was wondering, how Cassandra spark connector deals with deleted/updated
> record while readstream operation. If the record was already fetched in
> spark memory, and it got updated or deleted in database, does it get
> reflected in streaming join?
>
> Thanks,
> Rahul
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


[Structured spak streaming] How does cassandra connector readstream deals with deleted record

2020-06-24 Thread Rahul Kumar
Hello everyone,

I was wondering, how Cassandra spark connector deals with deleted/updated
record while readstream operation. If the record was already fetched in
spark memory, and it got updated or deleted in database, does it get
reflected in streaming join?

Thanks,
Rahul



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading multiple partitions in parallel.

2016-11-26 Thread kant kodali
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameReader.html#json(org.apache.spark.rdd.RDD)

You can pass a rdd to spark.read.json. // Spark here is SparkSession

Also it works completely fine with smaller dataset in a table but with 1B
records it takes forever and more importantly the network throughput is
only 2.2 KB/s which is too low. It should be somewhere in MB/s

On Sat, Nov 26, 2016 at 10:09 AM, Anastasios Zouzias <zouz...@gmail.com>
wrote:

> Hi there,
>
> spark.read.json usually takes a filesystem path (usually HDFS) where there
> is a file containing JSON per new line. See also
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> Hence, in your case
>
> val df4 = spark.read.json(rdd) // This line takes forever
>
> seems wrong. I guess you might want to first store rdd as a text file on
> HDFS and then read it using spark.read.json .
>
> Cheers,
> Anastasios
>
>
>
> On Sat, Nov 26, 2016 at 9:34 AM, kant kodali <kanth...@gmail.com> wrote:
>
>> up vote
>> down votefavorite
>> <http://stackoverflow.com/questions/40797231/apache-spark-or-spark-cassandra-connector-doesnt-look-like-it-is-reading-multipl?noredirect=1#>
>>
>> Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading
>> multiple partitions in parallel.
>>
>> Here is my code using spark-shell
>>
>> import org.apache.spark.sql._
>> import org.apache.spark.sql.types.StringType
>> spark.sql("""CREATE TEMPORARY VIEW hello USING 
>> org.apache.spark.sql.cassandra OPTIONS (table "hello", keyspace "db", 
>> cluster "Test Cluster", pushdown "true")""")
>> val df = spark.sql("SELECT test from hello")
>> val df2 = df.select(df("test").cast(StringType).as("test"))
>> val rdd = df2.rdd.map { case Row(j: String) => j }
>> val df4 = spark.read.json(rdd) // This line takes forever
>>
>> I have about 700 million rows each row is about 1KB and this line
>>
>> val df4 = spark.read.json(rdd) takes forever as I get the following
>> output after 1hr 30 mins
>>
>> Stage 1:==> (4866 + 2) / 25256]
>>
>> so at this rate it will probably take days.
>>
>> I measured the network throughput rate of spark worker nodes using iftop
>> and it is about 2.2KB/s (kilobytes per second) which is too low so that
>> tells me it not reading partitions in parallel or at very least it is not
>> reading good chunk of data else it would be in MB/s. Any ideas on how to
>> fix it?
>>
>>
>
>
> --
> -- Anastasios Zouzias
> <a...@zurich.ibm.com>
>


Re: Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading multiple partitions in parallel.

2016-11-26 Thread Anastasios Zouzias
Hi there,

spark.read.json usually takes a filesystem path (usually HDFS) where there
is a file containing JSON per new line. See also

http://spark.apache.org/docs/latest/sql-programming-guide.html

Hence, in your case

val df4 = spark.read.json(rdd) // This line takes forever

seems wrong. I guess you might want to first store rdd as a text file on
HDFS and then read it using spark.read.json .

Cheers,
Anastasios



On Sat, Nov 26, 2016 at 9:34 AM, kant kodali <kanth...@gmail.com> wrote:

> up vote
> down votefavorite
> <http://stackoverflow.com/questions/40797231/apache-spark-or-spark-cassandra-connector-doesnt-look-like-it-is-reading-multipl?noredirect=1#>
>
> Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading
> multiple partitions in parallel.
>
> Here is my code using spark-shell
>
> import org.apache.spark.sql._
> import org.apache.spark.sql.types.StringType
> spark.sql("""CREATE TEMPORARY VIEW hello USING org.apache.spark.sql.cassandra 
> OPTIONS (table "hello", keyspace "db", cluster "Test Cluster", pushdown 
> "true")""")
> val df = spark.sql("SELECT test from hello")
> val df2 = df.select(df("test").cast(StringType).as("test"))
> val rdd = df2.rdd.map { case Row(j: String) => j }
> val df4 = spark.read.json(rdd) // This line takes forever
>
> I have about 700 million rows each row is about 1KB and this line
>
> val df4 = spark.read.json(rdd) takes forever as I get the following
> output after 1hr 30 mins
>
> Stage 1:==> (4866 + 2) / 25256]
>
> so at this rate it will probably take days.
>
> I measured the network throughput rate of spark worker nodes using iftop
> and it is about 2.2KB/s (kilobytes per second) which is too low so that
> tells me it not reading partitions in parallel or at very least it is not
> reading good chunk of data else it would be in MB/s. Any ideas on how to
> fix it?
>
>


-- 
-- Anastasios Zouzias
<a...@zurich.ibm.com>


Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading multiple partitions in parallel.

2016-11-26 Thread kant kodali
up vote
down votefavorite
<http://stackoverflow.com/questions/40797231/apache-spark-or-spark-cassandra-connector-doesnt-look-like-it-is-reading-multipl?noredirect=1#>

Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading
multiple partitions in parallel.

Here is my code using spark-shell

import org.apache.spark.sql._
import org.apache.spark.sql.types.StringType
spark.sql("""CREATE TEMPORARY VIEW hello USING
org.apache.spark.sql.cassandra OPTIONS (table "hello", keyspace "db",
cluster "Test Cluster", pushdown "true")""")
val df = spark.sql("SELECT test from hello")
val df2 = df.select(df("test").cast(StringType).as("test"))
val rdd = df2.rdd.map { case Row(j: String) => j }
val df4 = spark.read.json(rdd) // This line takes forever

I have about 700 million rows each row is about 1KB and this line

val df4 = spark.read.json(rdd) takes forever as I get the following output
after 1hr 30 mins

Stage 1:==> (4866 + 2) / 25256]

so at this rate it will probably take days.

I measured the network throughput rate of spark worker nodes using iftop
and it is about 2.2KB/s (kilobytes per second) which is too low so that
tells me it not reading partitions in parallel or at very least it is not
reading good chunk of data else it would be in MB/s. Any ideas on how to
fix it?


Re: Python - Spark Cassandra Connector on DC/OS

2016-11-01 Thread Andrew Holway
Sorry: Spark 2.0.0

On Tue, Nov 1, 2016 at 10:04 AM, Andrew Holway <
andrew.hol...@otternetworks.de> wrote:

> Hello,
>
> I've been getting pretty serious with DC/OS which I guess could be
> described as a somewhat polished distribution of Mesos. I'm not sure how
> relevant DC/OS is to this problem.
>
> I am using this pyspark program to test the cassandra connection:
> http://bit.ly/2eWAfxm (github)
>
> I can that the df.printSchema() method is working ok but the df.show()
> method is breaking with this error:
>
> Traceback (most recent call last):
>   File "/mnt/mesos/sandbox/squeeze.py", line 28, in 
> df.show()
>   File "/opt/spark/dist/python/lib/pyspark.zip/pyspark/sql/dataframe.py",
> line 287, in show
>   File "/opt/spark/dist/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
> line 933, in __call__
>   File "/opt/spark/dist/python/lib/pyspark.zip/pyspark/sql/utils.py",
> line 63, in deco
>   File "/opt/spark/dist/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
> line 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o33.showString.
>
> Full output to stdout and stderr. : http://bit.ly/2f80f9e (gist)
>
> Versions:
>
> Spark 2.0.1
> Python Version: 3.4.3 (default, Sep 14 2016, 12:36:27)
> [cqlsh 5.0.1 | Cassandra 2.2.8 | CQL spec 3.3.1 | Native protocol v4]
> DC/OS v.1.8.4
>
> Cheers,
>
> Andrew
>
> --
> Otter Networks UG
> http://otternetworks.de
> Gotenstraße 17
> 10829 Berlin
>



-- 
Otter Networks UG
http://otternetworks.de
Gotenstraße 17
10829 Berlin


Python - Spark Cassandra Connector on DC/OS

2016-11-01 Thread Andrew Holway
Hello,

I've been getting pretty serious with DC/OS which I guess could be
described as a somewhat polished distribution of Mesos. I'm not sure how
relevant DC/OS is to this problem.

I am using this pyspark program to test the cassandra connection:
http://bit.ly/2eWAfxm (github)

I can that the df.printSchema() method is working ok but the df.show()
method is breaking with this error:

Traceback (most recent call last):
  File "/mnt/mesos/sandbox/squeeze.py", line 28, in 
df.show()
  File "/opt/spark/dist/python/lib/pyspark.zip/pyspark/sql/dataframe.py",
line 287, in show
  File
"/opt/spark/dist/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line
933, in __call__
  File "/opt/spark/dist/python/lib/pyspark.zip/pyspark/sql/utils.py", line
63, in deco
  File "/opt/spark/dist/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o33.showString.

Full output to stdout and stderr. : http://bit.ly/2f80f9e (gist)

Versions:

Spark 2.0.1
Python Version: 3.4.3 (default, Sep 14 2016, 12:36:27)
[cqlsh 5.0.1 | Cassandra 2.2.8 | CQL spec 3.3.1 | Native protocol v4]
DC/OS v.1.8.4

Cheers,

Andrew

-- 
Otter Networks UG
http://otternetworks.de
Gotenstraße 17
10829 Berlin


Re: unresolved dependency: datastax#spark-cassandra-connector;2.0.0-s_2.11-M3-20-g75719df: not found

2016-09-21 Thread Kevin Mellott
The "unresolved dependency" error is stating that the datastax dependency
could not be located in the Maven repository. I believe that this should
work if you change that portion of your command to the following.

--packages com.datastax.spark:spark-cassandra-connector_2.10:2.0.0-M3

You can verify the available versions by searching Maven at
http://search.maven.org.

Thanks,
Kevin

On Wed, Sep 21, 2016 at 3:38 AM, muhammet pakyürek <mpa...@hotmail.com>
wrote:

> while i run the spark-shell as below
>
> spark-shell --jars '/home/ktuser/spark-cassandra-
> connector/target/scala-2.11/root_2.11-2.0.0-M3-20-g75719df.jar'
> --packages datastax:spark-cassandra-connector:2.0.0-s_2.11-M3-20-g75719df
> --conf spark.cassandra.connection.host=localhost
>
> i get the error
> unresolved dependency: datastax#spark-cassandra-
> connector;2.0.0-s_2.11-M3-20-g75719df.
>
>
> the second question even if i added
>
> libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % 
> "2.0.0-M3"
>
> to spark-cassandra-connector/sbt/sbt file jar files are
> root_2.11-2.0.0-M3-20-g75719df
>
>
> teh third question after build of connectpr scala 2.11 how do i integrate
> it with pyspark?
>
>


unresolved dependency: datastax#spark-cassandra-connector;2.0.0-s_2.11-M3-20-g75719df: not found

2016-09-21 Thread muhammet pakyürek
while i run the spark-shell as below

spark-shell --jars 
'/home/ktuser/spark-cassandra-connector/target/scala-2.11/root_2.11-2.0.0-M3-20-g75719df.jar'
 --packages datastax:spark-cassandra-connector:2.0.0-s_2.11-M3-20-g75719df 
--conf spark.cassandra.connection.host=localhost

i get the error
unresolved dependency: 
datastax#spark-cassandra-connector;2.0.0-s_2.11-M3-20-g75719df.


the second question even if i added

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % 
"2.0.0-M3"

to spark-cassandra-connector/sbt/sbt file jar files are 
root_2.11-2.0.0-M3-20-g75719df


teh third question after build of connectpr scala 2.11 how do i integrate it 
with pyspark?



cassandra 3.7 is compatible with datastax Spark Cassandra Connector 2.0?

2016-09-19 Thread muhammet pakyürek





Re: clear steps for installation of spark, cassandra and cassandra connector to run on spyder 2.3.7 using python 3.5 and anaconda 2.4 ipython 4.0

2016-09-06 Thread ayan guha
Spark has pretty extensive documentation, that should be your starting
point. I do not use Cassandra much, but Cassandra connector should be a
spark package, so look for spark package website.

If I may say so, all docs should be one or two Google search away :)
On 6 Sep 2016 20:34, "muhammet pakyürek" <mpa...@hotmail.com> wrote:

>
>
> could u send me  documents and links to satisfy all above requirements of 
> installation
> of spark, cassandra and cassandra connector to run on spyder 2.3.7 using
> python 3.5 and anaconda 2.4 ipython 4.0
>
>
> --
>
>


clear steps for installation of spark, cassandra and cassandra connector to run on spyder 2.3.7 using python 3.5 and anaconda 2.4 ipython 4.0

2016-09-06 Thread muhammet pakyürek


could u send me  documents and links to satisfy all above requirements of 
installation of spark, cassandra and cassandra connector to run on spyder 2.3.7 
using python 3.5 and anaconda 2.4 ipython 4.0






Spark-Cassandra connector

2016-06-21 Thread Joaquin Alzola
Hi List

I am trying to install the Spark-Cassandra connector through maven or sbt but 
neither works.
Both of them try to connect to the Internet (which I do not have connection) to 
download certain files.

Is there a way to install the files manually?

I downloaded from the maven repository --> 
spark-cassandra-connector_2.10-1.6.0.jar
Which is the version of Scala and Spark that I have ... but where to put it?

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-09 Thread Andy Davidson
Hi Ted and Saurahb

If I use —conf arguments with pyspark I am able to connect. Any idea how I
can set these values programmatically? (I work on a notebook server and can
not easily reconfigure the server

This works

extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
--packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.10"

export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3
IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs --conf
spark.cassandra.connection.host=localhost --conf
spark.cassandra.connection.port=9043 $*


df = sqlContext.read\
.format("org.apache.spark.sql.cassandra")\
.options(table="json_timeseries", keyspace="notification")\
.load()
df.printSchema()
df.show(truncate=False)


I have tried using setContext.setConf() but it does not work. It does  not
seem to have any effect

#sqlContext.setConf("spark.cassandra.connection.host","localhost")
#sqlContext.setConf("spark.cassandra.connection.port","9043")

#sqlContext.setConf("connection.host","localhost")
#sqlContext.setConf("connection.port","9043")

sqlContext.setConf("host","localhost")
sqlContext.setConf("port","9043”)

Thanks

Andy

From:  Saurabh Bajaj <bajaj.onl...@gmail.com>
Date:  Tuesday, March 8, 2016 at 9:13 PM
To:  Andrew Davidson <a...@santacruzintegration.com>
Cc:  Ted Yu <yuzhih...@gmail.com>, "user @spark" <user@spark.apache.org>
Subject:  Re: pyspark spark-cassandra-connector java.io.IOException: Failed
to open native connection to Cassandra at {192.168.1.126}:9042

> Hi Andy, 
> 
> I believe you need to set the host and port settings separately
> spark.cassandra.connection.host
> spark.cassandra.connection.port
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/referenc
> e.md#cassandra-connection-parameters
> 
> Looking at the logs, it seems your port config is not being set and it's
> falling back to default.
> Let me know if that helps.
> 
> Saurabh Bajaj
> 
> On Tue, Mar 8, 2016 at 6:25 PM, Andy Davidson <a...@santacruzintegration.com>
> wrote:
>> Hi Ted
>> 
>> I believe by default cassandra listens on 9042
>> 
>> From:  Ted Yu <yuzhih...@gmail.com>
>> Date:  Tuesday, March 8, 2016 at 6:11 PM
>> To:  Andrew Davidson <a...@santacruzintegration.com>
>> Cc:  "user @spark" <user@spark.apache.org>
>> Subject:  Re: pyspark spark-cassandra-connector java.io.IOException: Failed
>> to open native connection to Cassandra at {192.168.1.126}:9042
>> 
>>> Have you contacted spark-cassandra-connector related mailing list ?
>>> 
>>> I wonder where the port 9042 came from.
>>> 
>>> Cheers
>>> 
>>> On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson
>>> <a...@santacruzintegration.com> wrote:
>>>> 
>>>> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
>>>> notebook that reads a data frame from Cassandra.
>>>> 
>>>> I connect to cassadra using an ssh tunnel running on port 9043. CQLSH works
>>>> how ever I can not figure out how to configure my notebook. I have tried
>>>> various hacks any idea what I am doing wrong
>>>> 
>>>> : java.io.IOException: Failed to open native connection to Cassandra at
>>>> {192.168.1.126}:9042
>>>> 
>>>> 
>>>> 
>>>> Thanks in advance
>>>> 
>>>> Andy
>>>> 
>>>> 
>>>> 
>>>> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
>>>> --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>>>> 
>>>> $ export PYSPARK_PYTHON=python3
>>>> $ export PYSPARK_DRIVER_PYTHON=python3
>>>> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>>>> 
>>>> 
>>>> 
>>>> In [15]:
>>>> 1
>>>> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043
>>>> <http://127.0.0.1:9043> ")
>>>> 2
>>>> df = sqlContext.read\
>>>> 3
>>>> .format("org.apache.spark.sql.cassandra")\
>>>> 4
>>>> .options(table=“time_series", keyspace="notification")\
>>>> 5
>>>> .load()
>>>> 6
>>>> ​
>>>> 7
>>>> df.printSchema()
>>>> 8
>>>> df.show()
>>>> 

Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Ted Yu
>From cassandra.yaml :

native_transport_port: 9042

FYI

On Tue, Mar 8, 2016 at 9:13 PM, Saurabh Bajaj <bajaj.onl...@gmail.com>
wrote:

> Hi Andy,
>
> I believe you need to set the host and port settings separately
> spark.cassandra.connection.host
> spark.cassandra.connection.port
>
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-connection-parameters
>
> Looking at the logs, it seems your port config is not being set and it's
> falling back to default.
> Let me know if that helps.
>
> Saurabh Bajaj
>
> On Tue, Mar 8, 2016 at 6:25 PM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>> Hi Ted
>>
>> I believe by default cassandra listens on 9042
>>
>> From: Ted Yu <yuzhih...@gmail.com>
>> Date: Tuesday, March 8, 2016 at 6:11 PM
>> To: Andrew Davidson <a...@santacruzintegration.com>
>> Cc: "user @spark" <user@spark.apache.org>
>> Subject: Re: pyspark spark-cassandra-connector java.io.IOException:
>> Failed to open native connection to Cassandra at {192.168.1.126}:9042
>>
>> Have you contacted spark-cassandra-connector related mailing list ?
>>
>> I wonder where the port 9042 came from.
>>
>> Cheers
>>
>> On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson <
>> a...@santacruzintegration.com> wrote:
>>
>>>
>>> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
>>> notebook that reads a data frame from Cassandra.
>>>
>>> *I connect to cassadra using an ssh tunnel running on port 9043.* CQLSH
>>> works how ever I can not figure out how to configure my notebook. I have
>>> tried various hacks any idea what I am doing wrong
>>>
>>> : java.io.IOException: Failed to open native connection to Cassandra at 
>>> {192.168.1.126}:9042
>>>
>>>
>>>
>>>
>>> Thanks in advance
>>>
>>> Andy
>>>
>>>
>>>
>>> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
>>> --packages
>>> datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>>>
>>> $ export PYSPARK_PYTHON=python3
>>> $ export PYSPARK_DRIVER_PYTHON=python3
>>> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>>>
>>>
>>>
>>> In [15]:
>>> 1
>>>
>>> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043")
>>>
>>> 2
>>>
>>> df = sqlContext.read\
>>>
>>> 3
>>>
>>> .format("org.apache.spark.sql.cassandra")\
>>>
>>> 4
>>>
>>> .options(table=“time_series", keyspace="notification")\
>>>
>>> 5
>>>
>>> .load()
>>>
>>> 6
>>>
>>> ​
>>>
>>> 7
>>>
>>> df.printSchema()
>>>
>>> 8
>>>
>>> df.show()
>>>
>>> ---Py4JJavaError
>>>  Traceback (most recent call 
>>> last) in ()  1 
>>> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")> 
>>> 2 df = sqlContext.read.format("org.apache.spark.sql.cassandra")
>>> .options(table="kv", keyspace="notification").load()  3   4 
>>> df.printSchema()  5 
>>> df.show()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py
>>>  in load(self, path, format, schema, **options)137 
>>> return self._df(self._jreader.load(path))138 else:--> 139   
>>>   return self._df(self._jreader.load())140 141 
>>> @since(1.4)/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py
>>>  in __call__(self, *args)811 answer = 
>>> self.gateway_client.send_command(command)812 return_value = 
>>> get_return_value(--> 813 answer, self.gateway_client, 
>>> self.target_id, self.name)814 815 for temp_arg in 
>>> temp_args:/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py
>>>  in deco(*a, **kw) 43 def deco(*a, **kw): 44 try:---> 
>>> 45 return f(*a, **kw) 46 except 

Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Saurabh Bajaj
Hi Andy,

I believe you need to set the host and port settings separately
spark.cassandra.connection.host
spark.cassandra.connection.port
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-connection-parameters

Looking at the logs, it seems your port config is not being set and it's
falling back to default.
Let me know if that helps.

Saurabh Bajaj

On Tue, Mar 8, 2016 at 6:25 PM, Andy Davidson <a...@santacruzintegration.com
> wrote:

> Hi Ted
>
> I believe by default cassandra listens on 9042
>
> From: Ted Yu <yuzhih...@gmail.com>
> Date: Tuesday, March 8, 2016 at 6:11 PM
> To: Andrew Davidson <a...@santacruzintegration.com>
> Cc: "user @spark" <user@spark.apache.org>
> Subject: Re: pyspark spark-cassandra-connector java.io.IOException:
> Failed to open native connection to Cassandra at {192.168.1.126}:9042
>
> Have you contacted spark-cassandra-connector related mailing list ?
>
> I wonder where the port 9042 came from.
>
> Cheers
>
> On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>>
>> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
>> notebook that reads a data frame from Cassandra.
>>
>> *I connect to cassadra using an ssh tunnel running on port 9043.* CQLSH
>> works how ever I can not figure out how to configure my notebook. I have
>> tried various hacks any idea what I am doing wrong
>>
>> : java.io.IOException: Failed to open native connection to Cassandra at 
>> {192.168.1.126}:9042
>>
>>
>>
>>
>> Thanks in advance
>>
>> Andy
>>
>>
>>
>> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
>> --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>>
>> $ export PYSPARK_PYTHON=python3
>> $ export PYSPARK_DRIVER_PYTHON=python3
>> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>>
>>
>>
>> In [15]:
>> 1
>>
>> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043")
>>
>> 2
>>
>> df = sqlContext.read\
>>
>> 3
>>
>> .format("org.apache.spark.sql.cassandra")\
>>
>> 4
>>
>> .options(table=“time_series", keyspace="notification")\
>>
>> 5
>>
>> .load()
>>
>> 6
>>
>> ​
>>
>> 7
>>
>> df.printSchema()
>>
>> 8
>>
>> df.show()
>>
>> ---Py4JJavaError
>>  Traceback (most recent call 
>> last) in ()  1 
>> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")> 
>> 2 df = sqlContext.read.format("org.apache.spark.sql.cassandra")
>> .options(table="kv", keyspace="notification").load()  3   4 
>> df.printSchema()  5 
>> df.show()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py
>>  in load(self, path, format, schema, **options)137 
>> return self._df(self._jreader.load(path))138 else:--> 139
>>  return self._df(self._jreader.load())140 141 
>> @since(1.4)/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py
>>  in __call__(self, *args)811 answer = 
>> self.gateway_client.send_command(command)812 return_value = 
>> get_return_value(--> 813 answer, self.gateway_client, 
>> self.target_id, self.name)814 815 for temp_arg in 
>> temp_args:/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py
>>  in deco(*a, **kw) 43 def deco(*a, **kw): 44 try:---> 45 
>> return f(*a, **kw) 46 except 
>> py4j.protocol.Py4JJavaError as e: 47 s = 
>> e.java_exception.toString()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py
>>  in get_return_value(answer, gateway_client, target_id, name)306 
>> raise Py4JJavaError(307 "An error occurred 
>> while calling {0}{1}{2}.\n".--> 308 format(target_id, 
>> ".", name), value)
>> 309 else:310 raise Py4JError(
>

Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Andy Davidson
Hi Ted

I believe by default cassandra listens on 9042

From:  Ted Yu <yuzhih...@gmail.com>
Date:  Tuesday, March 8, 2016 at 6:11 PM
To:  Andrew Davidson <a...@santacruzintegration.com>
Cc:  "user @spark" <user@spark.apache.org>
Subject:  Re: pyspark spark-cassandra-connector java.io.IOException: Failed
to open native connection to Cassandra at {192.168.1.126}:9042

> Have you contacted spark-cassandra-connector related mailing list ?
> 
> I wonder where the port 9042 came from.
> 
> Cheers
> 
> On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson <a...@santacruzintegration.com>
> wrote:
>> 
>> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python notebook
>> that reads a data frame from Cassandra.
>> 
>> I connect to cassadra using an ssh tunnel running on port 9043. CQLSH works
>> how ever I can not figure out how to configure my notebook. I have tried
>> various hacks any idea what I am doing wrong
>> 
>> : java.io.IOException: Failed to open native connection to Cassandra at
>> {192.168.1.126}:9042
>> 
>> 
>> 
>> Thanks in advance
>> 
>> Andy
>> 
>> 
>> 
>> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
>> --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>> 
>> $ export PYSPARK_PYTHON=python3
>> $ export PYSPARK_DRIVER_PYTHON=python3
>> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>> 
>> 
>> 
>> In [15]:
>> 1
>> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043
>> <http://127.0.0.1:9043> ")
>> 2
>> df = sqlContext.read\
>> 3
>> .format("org.apache.spark.sql.cassandra")\
>> 4
>> .options(table=“time_series", keyspace="notification")\
>> 5
>> .load()
>> 6
>> ​
>> 7
>> df.printSchema()
>> 8
>> df.show()
>> ---Py
>> 4JJavaError Traceback (most recent call last)
>>  in ()  1
>> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")> 2
>> df = sqlContext.read.format("org.apache.spark.sql.cassandra")
>> .options(table="kv", keyspace="notification").load()  3   4
>> df.printSchema()  5
>> df.show()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/pyth
>> on/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
>> 137 return self._df(self._jreader.load(path))138
>> else:--> 139 return self._df(self._jreader.load())140 141
>> @since(1.4)/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/py
>> thon/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
>> 811 answer = self.gateway_client.send_command(command)812
>> return_value = get_return_value(
>> --> 813 answer, self.gateway_client, self.target_id, self.name
>> <http://self.name> )
>> 814 815 for temp_arg in
>> temp_args:/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/pyt
>> hon/pyspark/sql/utils.py in deco(*a, **kw) 43 def deco(*a, **kw):
>> 44 try:---> 45 return f(*a, **kw) 46 except
>> py4j.protocol.Py4JJavaError as e: 47 s =
>> e.java_exception.toString()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-
>> bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py in
>> get_return_value(answer, gateway_client, target_id, name)306
>> raise Py4JJavaError(
>> 307 "An error occurred while calling
>> {0}{1}{2}.\n".--> 308 format(target_id, ".", name),
>> value)
>> 309 else:310 raise Py4JError(
>> 
>> Py4JJavaError: An error occurred while calling o280.load.
>> : java.io.IOException: Failed to open native connection to Cassandra at
>> {192.168.1.126}:9042
>>  at 
>> com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$conne
>> ctor$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
>>  at 
>> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(Cassandr
>> aConnector.scala:148)
>>  at 
>> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(Cassandr
>> aConnector.scala:148)
>>  at 
>> com.datastax.spark.connector.cql

Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Ted Yu
Have you contacted spark-cassandra-connector related mailing list ?

I wonder where the port 9042 came from.

Cheers

On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson <a...@santacruzintegration.com
> wrote:

>
> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
> notebook that reads a data frame from Cassandra.
>
> *I connect to cassadra using an ssh tunnel running on port 9043.* CQLSH
> works how ever I can not figure out how to configure my notebook. I have
> tried various hacks any idea what I am doing wrong
>
> : java.io.IOException: Failed to open native connection to Cassandra at 
> {192.168.1.126}:9042
>
>
>
>
> Thanks in advance
>
> Andy
>
>
>
> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
> --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>
> $ export PYSPARK_PYTHON=python3
> $ export PYSPARK_DRIVER_PYTHON=python3
> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>
>
>
> In [15]:
> 1
>
> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043")
>
> 2
>
> df = sqlContext.read\
>
> 3
>
> .format("org.apache.spark.sql.cassandra")\
>
> 4
>
> .options(table=“time_series", keyspace="notification")\
>
> 5
>
> .load()
>
> 6
>
> ​
>
> 7
>
> df.printSchema()
>
> 8
>
> df.show()
>
> ---Py4JJavaError
>  Traceback (most recent call 
> last) in ()  1 
> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")> 2 
> df = sqlContext.read.format("org.apache.spark.sql.cassandra")
> .options(table="kv", keyspace="notification").load()  3   4 
> df.printSchema()  5 df.show()
> /Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py
>  in load(self, path, format, schema, **options)137 return 
> self._df(self._jreader.load(path))138 else:--> 139 
> return self._df(self._jreader.load())140 141 @since(1.4)
> /Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)811 answer = 
> self.gateway_client.send_command(command)812 return_value = 
> get_return_value(--> 813 answer, self.gateway_client, 
> self.target_id, self.name)814 815 for temp_arg in temp_args:
> /Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py
>  in deco(*a, **kw) 43 def deco(*a, **kw): 44 try:---> 45  
>return f(*a, **kw) 46 except 
> py4j.protocol.Py4JJavaError as e: 47 s = 
> e.java_exception.toString()
> /Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py
>  in get_return_value(answer, gateway_client, target_id, name)306  
>raise Py4JJavaError(307 "An error occurred 
> while calling {0}{1}{2}.\n".--> 308 format(target_id, 
> ".", name), value)309 else:310 raise 
> Py4JError(
> Py4JJavaError: An error occurred while calling o280.load.
> : java.io.IOException: Failed to open native connection to Cassandra at 
> {192.168.1.126}:9042
>   at 
> com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
>   at 
> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
>   at 
> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
>   at 
> com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
>   at 
> com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
>   at 
> com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
>   at 
> com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
>   at 
> com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:184)
>   at 
> org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:267)
>   at 
> org.apache.spark.sql.cassandra.DefaultSource.createRelation(Defaul

pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Andy Davidson

I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python notebook
that reads a data frame from Cassandra.

I connect to cassadra using an ssh tunnel running on port 9043. CQLSH works
how ever I can not figure out how to configure my notebook. I have tried
various hacks any idea what I am doing wrong

: java.io.IOException: Failed to open native connection to Cassandra at
{192.168.1.126}:9042



Thanks in advance

Andy



$ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
--packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"

$ export PYSPARK_PYTHON=python3
$ export PYSPARK_DRIVER_PYTHON=python3
$ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*



In [15]:
1
sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043")
2
df = sqlContext.read\
3
.format("org.apache.spark.sql.cassandra")\
4
.options(table=“time_series", keyspace="notification")\
5
.load()
6
​
7
df.printSchema()
8
df.show()
---
Py4JJavaError Traceback (most recent call last)
 in ()
  1 
sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")
> 2 df = sqlContext.read.format("org.apache.spark.sql.cassandra")
.options(table="kv", keyspace="notification").load()
  3 
  4 df.printSchema()
  5 df.show()

/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspa
rk/sql/readwriter.py in load(self, path, format, schema, **options)
137 return self._df(self._jreader.load(path))
138 else:
--> 139 return self._df(self._jreader.load())
140 
141 @since(1.4)

/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/p
y4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
811 answer = self.gateway_client.send_command(command)
812 return_value = get_return_value(
--> 813 answer, self.gateway_client, self.target_id, self.name)
814 
815 for temp_arg in temp_args:

/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspa
rk/sql/utils.py in deco(*a, **kw)
 43 def deco(*a, **kw):
 44 try:
---> 45 return f(*a, **kw)
 46 except py4j.protocol.Py4JJavaError as e:
 47 s = e.java_exception.toString()

/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/p
y4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client,
target_id, name)
306 raise Py4JJavaError(
307 "An error occurred while calling {0}{1}{2}.\n".
--> 308 format(target_id, ".", name), value)
309 else:
310 raise Py4JError(

Py4JJavaError: An error occurred while calling o280.load.
: java.io.IOException: Failed to open native connection to Cassandra at
{192.168.1.126}:9042
at 
com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$conn
ector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
at 
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(Cassand
raConnector.scala:148)
at 
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(Cassand
raConnector.scala:148)
at 
com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCo
untedCache.scala:31)
at 
com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.sca
la:56)
at 
com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraCon
nector.scala:81)
at 
com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraC
onnector.scala:109)
at 
com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTok
enFactory(CassandraRDDPartitioner.scala:184)
at 
org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourc
eRelation.scala:267)
at 
org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.sc
ala:57)
at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(Resolve
dDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62
)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java

Re: metrics not reported by spark-cassandra-connector

2016-02-23 Thread Sa Xiao
Hi Yin,

Thanks for your reply. I didn't realize there is a specific mailing list
for spark-Cassandra-connector. I will ask there. Thanks!

-Sa

On Tuesday, February 23, 2016, Yin Yang <yy201...@gmail.com> wrote:

> Hi, Sa:
> Have you asked on spark-cassandra-connector mailing list ?
>
> Seems you would get better response there.
>
> Cheers
>


Re: metrics not reported by spark-cassandra-connector

2016-02-23 Thread Yin Yang
Hi, Sa:
Have you asked on spark-cassandra-connector mailing list ?

Seems you would get better response there.

Cheers


metrics not reported by spark-cassandra-connector

2016-02-23 Thread Sa Xiao
Hi there,

I am trying to enable the metrics collection by spark-cassandra-connector,
following the instruction here:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/11_metrics.md

However, I was not able to see any metrics reported. I'm using
spark-cassandra-connector_2.10:1.5.0, and spark 1.5.1. I am trying to send
the metrics to statsD. My metrics.properties is as the following:

*.sink.statsd.class=org.apache.spark.metrics.sink.StatsDSink

*.sink.statsd.host=localhost

*.sink.statsd.port=18125

executor.source.cassandra-connector.class=org.apache.spark.metrics.CassandraConnectorSource

driver.source.cassandra-connector.class=org.apache.spark.metrics.CassandraConnectorSource

I'm able to see other metrics, e.g. DAGScheduler, but not any from the
CassandraConnectorSource. E.g. I tried to search "write-byte-meter", but
didn't find it. I didn't see the metrics on the spark UI either. I didn't
find any relevant error or info in the log that indicates the
CassandraConnectorSource is actually registered by the spark metrics
system. Any pointers would be very much appreciated!

Thanks,

Sa


Re: [Cassandra-Connector] No Such Method Error despite correct versions

2016-02-22 Thread Jan Algermissen
Doh - minutes after my question I saw the same from a couple of days ago

Indeed, using C* driver 3.0.0-rc1 seems to solve the issue

Jan

> On 22 Feb 2016, at 12:13, Jan Algermissen <algermissen1...@icloud.com> wrote:
> 
> Hi,
> 
> I am using
> 
> Cassandra 2.1.5
> Spark 1.5.2
> Cassandra java-drive 3.0.0
> Cassandra-Connector 1.5.0-RC1
> 
> All with scala 2.11.7
> 
> Nevertheless, I get the following error from my Spark job:
> 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.core.TableMetadata.getIndexes()Ljava/util/List;
> at com.datastax.spark.connector.cql.Schema$.getIndexMap(Schema.scala:198)
> at 
> com.datastax.spark.connector.cql.Schema$.com$datastax$spark$connector$cql$Schema$$fetchPartitionKey(Schema.scala:202)
> 
> I checked that getIndexMap() is present in the version used - it is.
> 
> I also decompiled the class TableMetadata directly from the fat jar I am 
> deploying. The method is present. The only thing I noticed is that the 
> signature the decompiler shows is
> 
> public Collection getIndexes() {
> 
> and not List - but I guess that is just the recompilation.
> 
> I am pretty lost regarding what steps to take to get rid of this error.
> 
> Can anyone provide a clue?
> 
> Jan
> 
> 
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Cassandra-Connector] No Such Method Error despite correct versions

2016-02-22 Thread Jan Algermissen
Hi,

I am using

Cassandra 2.1.5
Spark 1.5.2
Cassandra java-drive 3.0.0
Cassandra-Connector 1.5.0-RC1

All with scala 2.11.7

Nevertheless, I get the following error from my Spark job:

java.lang.NoSuchMethodError: 
com.datastax.driver.core.TableMetadata.getIndexes()Ljava/util/List;
at com.datastax.spark.connector.cql.Schema$.getIndexMap(Schema.scala:198)
at 
com.datastax.spark.connector.cql.Schema$.com$datastax$spark$connector$cql$Schema$$fetchPartitionKey(Schema.scala:202)

I checked that getIndexMap() is present in the version used - it is.

I also decompiled the class TableMetadata directly from the fat jar I am 
deploying. The method is present. The only thing I noticed is that the 
signature the decompiler shows is

public Collection getIndexes() {

and not List - but I guess that is just the recompilation.

I am pretty lost regarding what steps to take to get rid of this error.

Can anyone provide a clue?

Jan




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark-cassandra-connector BulkOutputWriter

2016-02-09 Thread Alexandr Dzhagriev
Hello all,

I looked through the cassandra spark integration (
https://github.com/datastax/spark-cassandra-connector) and couldn't find
any usages of the BulkOutputWriter (
http://www.datastax.com/dev/blog/bulk-loading) - an awesome tool for
creating local sstables, which could be later uploaded to a cassandra
cluster.  Seems like (sorry if I'm wrong), it uses just bulk insert
statements. So, my question is: does anybody know if there are any plans to
support bulk loading?

Cheers, Alex.


RE: spark-cassandra-connector BulkOutputWriter

2016-02-09 Thread Mohammed Guller
Alex – I suggest posting this question on the Spark Cassandra Connector mailing 
list. The SCC developers are pretty responsive.

Mohammed
Author: Big Data Analytics with 
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Alexandr Dzhagriev [mailto:dzh...@gmail.com]
Sent: Tuesday, February 9, 2016 6:52 AM
To: user
Subject: spark-cassandra-connector BulkOutputWriter

Hello all,

I looked through the cassandra spark integration 
(https://github.com/datastax/spark-cassandra-connector) and couldn't find any 
usages of the BulkOutputWriter (http://www.datastax.com/dev/blog/bulk-loading) 
- an awesome tool for creating local sstables, which could be later uploaded to 
a cassandra cluster.  Seems like (sorry if I'm wrong), it uses just bulk insert 
statements. So, my question is: does anybody know if there are any plans to 
support bulk loading?

Cheers, Alex.


Re: Spark 1.5.2 compatible spark-cassandra-connector

2015-12-29 Thread fightf...@163.com
Hi, Vivek M
I had ever tried 1.5.x spark-cassandra connector and indeed encounter some 
classpath issues, mainly for the guaua dependency. 
I believe that can be solved by some maven config, but have not tried that yet. 

Best,
Sun.



fightf...@163.com
 
From: vivek.meghanat...@wipro.com
Date: 2015-12-29 20:40
To: user@spark.apache.org
Subject: Spark 1.5.2 compatible spark-cassandra-connector
All,
What is the compatible spark-cassandra-connector for spark 1.5.2? I can only 
find the latest connector version spark-cassandra-connector_2.10-1.5.0-M3 which 
has dependency with 1.5.1 spark. Can we use the same for 1.5.2? Any classpath 
issues needs to be handled or any jars needs to be excluded while packaging the 
application jar?
 
http://central.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.10/1.5.0-M3/spark-cassandra-connector_2.10-1.5.0-M3.pom
 
Regards,
Vivek M
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com 


Re: Spark 1.5.2 compatible spark-cassandra-connector

2015-12-29 Thread mwy
2.10-1.5.0-M3 & spark 1.5.2 work for me. The jar is built by sbt-assembly.

Just for reference.

发件人:  "fightf...@163.com" <fightf...@163.com>
日期:  Wednesday, December 30, 2015 at 10:22
至:  "vivek.meghanat...@wipro.com" <vivek.meghanat...@wipro.com>, user
<user@spark.apache.org>
主题:  Re: Spark 1.5.2 compatible spark-cassandra-connector

Hi, Vivek M
I had ever tried 1.5.x spark-cassandra connector and indeed encounter some
classpath issues, mainly for the guaua dependency.
I believe that can be solved by some maven config, but have not tried that
yet. 

Best,
Sun.


fightf...@163.com
>  
> From: vivek.meghanat...@wipro.com
> Date: 2015-12-29 20:40
> To: user@spark.apache.org
> Subject: Spark 1.5.2 compatible spark-cassandra-connector
> All,
> What is the compatible spark-cassandra-connector for spark 1.5.2? I can only
> find the latest connector version spark-cassandra-connector_2.10-1.5.0-M3
> which has dependency with 1.5.1 spark. Can we use the same for 1.5.2? Any
> classpath issues needs to be handled or any jars needs to be excluded while
> packaging the application jar?
>  
> http://central.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2
> .10/1.5.0-M3/spark-cassandra-connector_2.10-1.5.0-M3.pom
>  
> Regards,
> Vivek M
> The information contained in this electronic message and any attachments to
> this message are intended for the exclusive use of the addressee(s) and may
> contain proprietary, confidential or privileged information. If you are not
> the intended recipient, you should not disseminate, distribute or copy this
> e-mail. Please notify the sender immediately and destroy all copies of this
> message and any attachments. WARNING: Computer viruses can be transmitted via
> email. The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage caused by
> any virus transmitted by this email. www.wipro.com




RE: Spark 1.5.2 compatible spark-cassandra-connector

2015-12-29 Thread vivek.meghanathan
Thank you mwy and Sun for your response. Yes basic things are working for me 
using this connector(guava issue was encountered earlier but with proper 
exclusion of old version we have resolved it).

The current issue is strange one �C we have a kafka-spark-cassandra streaming 
job in spark. The all jobs are working fine in local cluster (local lab) using 
1.3.0 spark. We are trying the same setup in Google cloud platform in 1.3.0 the 
jobs are up but it does not seems to be processing the kafka messages. Using 
1.5.2 spark with 1.5.0-M3 connector + 0.8.2.2 kafka I am able to make most of 
the jobs work but one of them is processing only 1 out of 50 messages from 
kafka.
We have wrote a test program(scala we use) to parse data from kafka stream �C 
it is working fine and always receives the messages. It connects to Cassandra 
gets some data works fine and prints the input data till that time,  but when I 
enable the join,map,reduce login , it starts to miss the message(not printing 
above worked lines also). Please find the lines added below, once I add the 
commented line to the job , it does not print any incoming data while I am able 
to print till then. Any ideas how to debug?

val cqlOfferSummaryRdd = ssc.sc.cql3Cassandra[OfferSummary](casOfferSummary)
  .map(summary => (summary.listingId, (summary.offerId, 
summary.redeem, summary.viewed, summary.reserved)))


/**
  //RDD -> ((listingId, searchId), Iterable(offerId, redeemCount, 
viewCount))
  val directListingOfferSummary = directViews.transform(Rdd => { 
Rdd.join(cqlOfferSummaryRdd,10)}) //RDD -> ((listingId), (Direct, (offerId, 
redeemCount, viewCount)))
  .map(rdd => ((rdd._1, rdd._2._1.searchId, rdd._2._2._1), 
(rdd._2._2._2, rdd._2._2._3, rdd._2._2._4))) //RDD -> ((listingId, searchId, 
offerId), (redeemCount, viewCount))
  .reduceByKey((a, b) => (a._1 + b._1, a._2 + b._2, a._3 + b._3))
  .map(rdd => ((rdd._1._1, rdd._1._2), (rdd._1._3, rdd._2._1, 
rdd._2._2, rdd._2._3))).groupByKey(10) //RDD -> ((listingId, searchId), 
Iterable(offerId, redeemCount, viewCount))
  directListingOfferSummary.print()

**/

Regards,
Vivek

From: mwy [mailto:wenyao...@dewmobile.net]
Sent: 30 December 2015 08:27
To: fightf...@163.com; Vivek Meghanathan (WT01 - NEP) 
<vivek.meghanat...@wipro.com>; user <user@spark.apache.org>
Subject: Re: Spark 1.5.2 compatible spark-cassandra-connector

2.10-1.5.0-M3 & spark 1.5.2 work for me. The jar is built by sbt-assembly.

Just for reference.

发件人: "fightf...@163.com<mailto:fightf...@163.com>" 
<fightf...@163.com<mailto:fightf...@163.com>>
日期: Wednesday, December 30, 2015 at 10:22
至: "vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>" 
<vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>>, user 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题: Re: Spark 1.5.2 compatible spark-cassandra-connector

Hi, Vivek M
I had ever tried 1.5.x spark-cassandra connector and indeed encounter some 
classpath issues, mainly for the guaua dependency.
I believe that can be solved by some maven config, but have not tried that yet.

Best,
Sun.


fightf...@163.com<mailto:fightf...@163.com>

From: vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>
Date: 2015-12-29 20:40
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark 1.5.2 compatible spark-cassandra-connector
All,
What is the compatible spark-cassandra-connector for spark 1.5.2? I can only 
find the latest connector version spark-cassandra-connector_2.10-1.5.0-M3 which 
has dependency with 1.5.1 spark. Can we use the same for 1.5.2? Any classpath 
issues needs to be handled or any jars needs to be excluded while packaging the 
application jar?

http://central.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.10/1.5.0-M3/spark-cassandra-connector_2.10-1.5.0-M3.pom

Regards,
Vivek M
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com<http://www.wipro.com>
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged inf

Spark 1.5.2 compatible spark-cassandra-connector

2015-12-29 Thread vivek.meghanathan
All,
What is the compatible spark-cassandra-connector for spark 1.5.2? I can only 
find the latest connector version spark-cassandra-connector_2.10-1.5.0-M3 which 
has dependency with 1.5.1 spark. Can we use the same for 1.5.2? Any classpath 
issues needs to be handled or any jars needs to be excluded while packaging the 
application jar?

http://central.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.10/1.5.0-M3/spark-cassandra-connector_2.10-1.5.0-M3.pom

Regards,
Vivek M
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com


Re: error in spark cassandra connector

2015-12-24 Thread Ted Yu
Mind providing a bit more detail ?

Release of Spark
version of Cassandra connector
How job was submitted
complete stack trace

Thanks

On Thu, Dec 24, 2015 at 2:06 AM, Vijay Kandiboyina <vi...@inndata.in> wrote:

> java.lang.NoClassDefFoundError:
> com/datastax/spark/connector/rdd/CassandraTableScanRDD
>
>


error in spark cassandra connector

2015-12-24 Thread Vijay Kandiboyina
java.lang.NoClassDefFoundError:
com/datastax/spark/connector/rdd/CassandraTableScanRDD


Spark- Cassandra Connector Error

2015-11-25 Thread ahlusar
Hello,

I receive the following error when I attempt to connect to a Cassandra
keyspace and table: WARN NettyUtil: 

"Found Netty's native epoll transport, but not running on linux-based
operating system. Using NIO instead"

The full details and log can be viewed here:
http://stackoverflow.com/questions/33896937/spark-connector-error-warn-nettyutil-found-nettys-native-epoll-transport-but


Thank you for your help and for your support. 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-Connector-Error-tp25483.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark-Cassandra-connector

2015-08-21 Thread Ted Yu
Have you considered asking this question on
https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user
?

Cheers

On Thu, Aug 20, 2015 at 10:57 PM, Samya samya.ma...@amadeus.com wrote:

 Hi All,

 I need to write an RDD to Cassandra  using the sparkCassandraConnector
 from
 DataStax. My application is using Yarn.

 *Some basic Questions :*
 1.  Will a call to saveToCassandra(.), be using the same connection
 object between all task in a given executor? I mean is there 1 (one)
 connection object per executor, that is shared between tasks ?
 2. If the above answer is YES, is there a way to create a connectionPool
 for
 each executor, so that multiple task can dump data to cassandra in
 parallel?

 Regards,
 Samya



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-connector-tp24378.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Spark-Cassandra-connector

2015-08-20 Thread Samya
Hi All,

I need to write an RDD to Cassandra  using the sparkCassandraConnector from
DataStax. My application is using Yarn.

*Some basic Questions :*
1.  Will a call to saveToCassandra(.), be using the same connection
object between all task in a given executor? I mean is there 1 (one)
connection object per executor, that is shared between tasks ?
2. If the above answer is YES, is there a way to create a connectionPool for
each executor, so that multiple task can dump data to cassandra in parallel?

Regards,
Samya



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-connector-tp24378.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Cassandra Connector issue

2015-08-10 Thread satish chandra j
HI All,
I have tried Commands as mentioned below but still it is errors

dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
--jars /home/missingmerch/
postgresql-9.4-1201.jdbc41.jar,/home/missingmerch/dse.jar,/home/missingmerch/spark-
cassandra-connector-java_2.10-1.1.1.jar /home/missingmerch/etl-0.0.
1-SNAPSHOT.jar

dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
--jars /home/missingmerch/
postgresql-9.4-1201.jdbc41.jar,/home/missingmerch/dse.jar,/home/missingmerch/spark-
cassandra-connector-java_2.10-1.1.1.jar,/home/missingmerch/etl-0.0.
1-SNAPSHOT.jar

I understand only problem with the way I provide list of jar file in the
command, if anybody using Datastax Enterprise could please provide thier
inputs to get this issue resolved

Thanks for your support

Satish Chandra


On Mon, Aug 10, 2015 at 7:16 PM, Dean Wampler deanwamp...@gmail.com wrote:

 I don't know if DSE changed spark-submit, but you have to use a
 comma-separated list of jars to --jars. It probably looked for HelloWorld
 in the second one, the dse.jar file. Do this:

 dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
 --jars /home/missingmerch/
 postgresql-9.4-1201.jdbc41.jar,/home/missingmerch/dse.jar,/home/missingmerch/spark-
 cassandra-connector-java_2.10-1.1.1.jar /home/missingmerch/etl-0.0.
 1-SNAPSHOT.jar

 I also removed the extra //. Or put file: in front of them so they are
 proper URLs. Note the snapshot jar isn't in the --jars list. I assume
 that's where HelloWorld is found. Confusing, yes it is...

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Mon, Aug 10, 2015 at 8:23 AM, satish chandra j 
 jsatishchan...@gmail.com wrote:

 Hi,
 Thanks for quick input, now I am getting class not found error

 *Command:*

 dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
 --jars ///home/missingmerch/postgresql-9.4-1201.jdbc41.jar
 ///home/missingmerch/dse.jar
 ///home/missingmerch/spark-cassandra-connector-java_2.10-1.1.1.jar
 ///home/missingmerch/etl-0.0.1-SNAPSHOT.jar


 *Error:*

 java.lang.ClassNotFoundException: HelloWorld

 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

 at java.security.AccessController.doPrivileged(Native Method)

 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

 at java.lang.Class.forName0(Native Method)

 at java.lang.Class.forName(Class.java:270)

 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:342)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


 Previously I could fix the issue by changing the order of arguments
 passing in DSE command line interface but now I am not sure why the issue
 again

 Please let me know if still I am missing anything in my Command as
 mentioned above(as insisted I have added dse.jar and
 spark-cassandra-connector-java_2.10.1.1.1.jar)


 Thanks for support


 Satish Chandra

 On Mon, Aug 10, 2015 at 6:19 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Add the other Cassandra dependencies (dse.jar,
 spark-cassandra-connect-java_2.10) to your --jars argument on the command
 line.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Mon, Aug 10, 2015 at 7:44 AM, satish chandra j 
 jsatishchan...@gmail.com wrote:

 HI All,
 Please help me to fix Spark Cassandra Connector issue, find the details
 below

 *Command:*

 dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
 --jars ///home/missingmerch/postgresql-9.4-1201.jdbc41.jar
 ///home/missingmerch/etl-0.0.1-SNAPSHOT.jar


 *Error:*


 WARN  2015-08-10 06:33:35 org.apache.spark.util.Utils: Service
 'SparkUI' could not bind on port 4040. Attempting port 4041.

 Exception in thread main java.lang.NoSuchMethodError:
 com.datastax.spark.connector.package$.toRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;)Lcom/datastax/spark/connector/RDDFunctions;

 at HelloWorld$.main(HelloWorld.scala:29)

 at HelloWorld.main(HelloWorld.scala)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606

Spark Cassandra Connector issue

2015-08-10 Thread satish chandra j
HI All,
Please help me to fix Spark Cassandra Connector issue, find the details
below

*Command:*

dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
--jars ///home/missingmerch/postgresql-9.4-1201.jdbc41.jar
///home/missingmerch/etl-0.0.1-SNAPSHOT.jar


*Error:*


WARN  2015-08-10 06:33:35 org.apache.spark.util.Utils: Service 'SparkUI'
could not bind on port 4040. Attempting port 4041.

Exception in thread main java.lang.NoSuchMethodError:
com.datastax.spark.connector.package$.toRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;)Lcom/datastax/spark/connector/RDDFunctions;

at HelloWorld$.main(HelloWorld.scala:29)

at HelloWorld.main(HelloWorld.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


*Code:*

*import* *org.apache*.spark.SparkContext

*import* *org.apache*.spark.SparkContext._

*import* *org.apache*.spark.SparkConf

*import* *org.apache*.spark.rdd.JdbcRDD

*import* *com.datastax*.spark.connector._

*import* com.datastax.spark.connector.cql.CassandraConnector

*import* com.datastax.bdp.spark.DseSparkConfHelper._

*import* java.sql.{Connection, DriverManager, ResultSet, PreparedStatement,
SQLException, Statement}

*object* HelloWorld {

*def* main(args: Array[String]) {

  *def* createSparkContext() = {

   *val** myJar =
*getClass.getProtectionDomain.getCodeSource.getLocation.getPath


   *val* conf = *new* SparkConf().set(spark.cassandra.connection.host,
10.246.43.15)

   .setAppName(First Spark App)

   .setMaster(local)

*   .s*etJars(Array(myJar))

   .set(cassandra.username, username)

   .set(cassandra.password, password)

   .forDse

   *new* SparkContext(conf)

}



  *val* sc = createSparkContext()

  *val* user=hkonak0

  *val** pass=*Winter18

  Class.forName(org.postgresql.Driver).newInstance

  *val* url = jdbc:postgresql://gptester:5432/db_test

  *val* myRDD27 = *new* JdbcRDD( sc, ()=
DriverManager.getConnection(url,user,pass),select * from
wmax_vmax.arm_typ_txt LIMIT ? OFFSET ?,5,0,1,(r: ResultSet) = {(r.getInt(
alarm_type_code),r.getString(language_code),r.getString(
alrm_type_cd_desc))})

  myRDD27.saveToCassandra(keyspace,arm_typ_txt,SomeColumns(
alarm_type_code,language_code,alrm_type_cd_desc))

  println(myRDD27.count())

  println(myRDD27.first)

  sc.stop()

  sys.exit()



}

  }



*POM XML:*


dependencies

  dependency

 groupIdorg.apache.spark/groupId

 artifactIdspark-core_2.10/artifactId

 version1.2.2/version

  /dependency

  dependency

 groupIdorg.apache.hadoop/groupId

 artifactId*hadoop*-client/artifactId

 version1.2.1/version

  /dependency

  dependency

 groupIdorg.scala-*lang*/groupId

 artifactId*scala*-library/artifactId

 version2.10.5/version

  /dependency

  dependency

 groupId*junit*/groupId

 artifactId*junit*/artifactId

 version3.8.1/version

 scopetest/scope

  /dependency

  dependency

 groupIdcom.datastax.dse/groupId

 artifactId*dse*/artifactId

 version4.7.2/version

  scopesystem/scope

 systemPathC:\workspace\*etl*\*lib*\dse.jar/
systemPath

   /dependency

  dependency

  groupIdcom.datastax.spark/groupId

  artifactIdspark-*cassandra*-connector-java_2.10/artifactId

  version1.1.1/version

   /dependency

   /dependencies


Please let me know if any further details required to analyze the issue


Regards,

Satish Chandra


Re: Spark Cassandra Connector issue

2015-08-10 Thread Dean Wampler
Add the other Cassandra dependencies (dse.jar,
spark-cassandra-connect-java_2.10) to your --jars argument on the command
line.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Mon, Aug 10, 2015 at 7:44 AM, satish chandra j jsatishchan...@gmail.com
wrote:

 HI All,
 Please help me to fix Spark Cassandra Connector issue, find the details
 below

 *Command:*

 dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
 --jars ///home/missingmerch/postgresql-9.4-1201.jdbc41.jar
 ///home/missingmerch/etl-0.0.1-SNAPSHOT.jar


 *Error:*


 WARN  2015-08-10 06:33:35 org.apache.spark.util.Utils: Service 'SparkUI'
 could not bind on port 4040. Attempting port 4041.

 Exception in thread main java.lang.NoSuchMethodError:
 com.datastax.spark.connector.package$.toRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;)Lcom/datastax/spark/connector/RDDFunctions;

 at HelloWorld$.main(HelloWorld.scala:29)

 at HelloWorld.main(HelloWorld.scala)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


 *Code:*

 *import* *org.apache*.spark.SparkContext

 *import* *org.apache*.spark.SparkContext._

 *import* *org.apache*.spark.SparkConf

 *import* *org.apache*.spark.rdd.JdbcRDD

 *import* *com.datastax*.spark.connector._

 *import* com.datastax.spark.connector.cql.CassandraConnector

 *import* com.datastax.bdp.spark.DseSparkConfHelper._

 *import* java.sql.{Connection, DriverManager, ResultSet,
 PreparedStatement, SQLException, Statement}

 *object* HelloWorld {

 *def* main(args: Array[String]) {

   *def* createSparkContext() = {

*val** myJar = 
 *getClass.getProtectionDomain.getCodeSource.getLocation.getPath


*val* conf = *new* SparkConf().set(
 spark.cassandra.connection.host, 10.246.43.15)

.setAppName(First Spark App)

.setMaster(local)

 *   .s*etJars(Array(myJar))

.set(cassandra.username, username)

.set(cassandra.password, password)

.forDse

*new* SparkContext(conf)

 }



   *val* sc = createSparkContext()

   *val* user=hkonak0

   *val** pass=*Winter18

   Class.forName(org.postgresql.Driver).newInstance

   *val* url = jdbc:postgresql://gptester:5432/db_test

   *val* myRDD27 = *new* JdbcRDD( sc, ()=
 DriverManager.getConnection(url,user,pass),select * from
 wmax_vmax.arm_typ_txt LIMIT ? OFFSET ?,5,0,1,(r: ResultSet) =
 {(r.getInt(alarm_type_code),r.getString(language_code),r.getString(
 alrm_type_cd_desc))})

   myRDD27.saveToCassandra(keyspace,arm_typ_txt,SomeColumns(
 alarm_type_code,language_code,alrm_type_cd_desc))

   println(myRDD27.count())

   println(myRDD27.first)

   sc.stop()

   sys.exit()



 }

   }



 *POM XML:*


 dependencies

   dependency

  groupIdorg.apache.spark/groupId

  artifactIdspark-core_2.10/artifactId

  version1.2.2/version

   /dependency

   dependency

  groupIdorg.apache.hadoop/groupId

  artifactId*hadoop*-client/artifactId

  version1.2.1/version

   /dependency

   dependency

  groupIdorg.scala-*lang*/groupId

  artifactId*scala*-library/artifactId

  version2.10.5/version

   /dependency

   dependency

  groupId*junit*/groupId

  artifactId*junit*/artifactId

  version3.8.1/version

  scopetest/scope

   /dependency

   dependency

  groupIdcom.datastax.dse/groupId

  artifactId*dse*/artifactId

  version4.7.2/version

   scopesystem/scope

  systemPathC:\workspace\*etl*\*lib*\dse.jar/
 systemPath

/dependency

   dependency

   groupIdcom.datastax.spark/groupId

   artifactIdspark-*cassandra*-connector-java_2.10/
 artifactId

   version1.1.1/version

/dependency

/dependencies


 Please let me know if any further details required to analyze the issue


 Regards,

 Satish Chandra



Re: Spark Cassandra Connector issue

2015-08-10 Thread Dean Wampler
I don't know if DSE changed spark-submit, but you have to use a
comma-separated list of jars to --jars. It probably looked for HelloWorld
in the second one, the dse.jar file. Do this:

dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
--jars /home/missingmerch/
postgresql-9.4-1201.jdbc41.jar,/home/missingmerch/dse.jar,/home/missingmerch/spark-
cassandra-connector-java_2.10-1.1.1.jar /home/missingmerch/etl-0.0.
1-SNAPSHOT.jar

I also removed the extra //. Or put file: in front of them so they are
proper URLs. Note the snapshot jar isn't in the --jars list. I assume
that's where HelloWorld is found. Confusing, yes it is...

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Mon, Aug 10, 2015 at 8:23 AM, satish chandra j jsatishchan...@gmail.com
wrote:

 Hi,
 Thanks for quick input, now I am getting class not found error

 *Command:*

 dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
 --jars ///home/missingmerch/postgresql-9.4-1201.jdbc41.jar
 ///home/missingmerch/dse.jar
 ///home/missingmerch/spark-cassandra-connector-java_2.10-1.1.1.jar
 ///home/missingmerch/etl-0.0.1-SNAPSHOT.jar


 *Error:*

 java.lang.ClassNotFoundException: HelloWorld

 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

 at java.security.AccessController.doPrivileged(Native Method)

 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

 at java.lang.Class.forName0(Native Method)

 at java.lang.Class.forName(Class.java:270)

 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:342)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


 Previously I could fix the issue by changing the order of arguments
 passing in DSE command line interface but now I am not sure why the issue
 again

 Please let me know if still I am missing anything in my Command as
 mentioned above(as insisted I have added dse.jar and
 spark-cassandra-connector-java_2.10.1.1.1.jar)


 Thanks for support


 Satish Chandra

 On Mon, Aug 10, 2015 at 6:19 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Add the other Cassandra dependencies (dse.jar,
 spark-cassandra-connect-java_2.10) to your --jars argument on the command
 line.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Mon, Aug 10, 2015 at 7:44 AM, satish chandra j 
 jsatishchan...@gmail.com wrote:

 HI All,
 Please help me to fix Spark Cassandra Connector issue, find the details
 below

 *Command:*

 dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
 --jars ///home/missingmerch/postgresql-9.4-1201.jdbc41.jar
 ///home/missingmerch/etl-0.0.1-SNAPSHOT.jar


 *Error:*


 WARN  2015-08-10 06:33:35 org.apache.spark.util.Utils: Service 'SparkUI'
 could not bind on port 4040. Attempting port 4041.

 Exception in thread main java.lang.NoSuchMethodError:
 com.datastax.spark.connector.package$.toRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;)Lcom/datastax/spark/connector/RDDFunctions;

 at HelloWorld$.main(HelloWorld.scala:29)

 at HelloWorld.main(HelloWorld.scala)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

 at
 org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


 *Code:*

 *import* *org.apache*.spark.SparkContext

 *import* *org.apache*.spark.SparkContext._

 *import* *org.apache*.spark.SparkConf

 *import* *org.apache*.spark.rdd.JdbcRDD

 *import* *com.datastax*.spark.connector._

 *import* com.datastax.spark.connector.cql.CassandraConnector

 *import* com.datastax.bdp.spark.DseSparkConfHelper._

 *import* java.sql.{Connection, DriverManager, ResultSet,
 PreparedStatement, SQLException, Statement}

 *object* HelloWorld {

 *def* main(args: Array[String]) {

   *def* createSparkContext() = {

*val** myJar = 
 *getClass.getProtectionDomain.getCodeSource.getLocation.getPath


*val* conf = *new* SparkConf().set

Re: Spark Cassandra Connector issue

2015-08-10 Thread satish chandra j
Hi,
Thanks for quick input, now I am getting class not found error

*Command:*

dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
--jars ///home/missingmerch/postgresql-9.4-1201.jdbc41.jar
///home/missingmerch/dse.jar
///home/missingmerch/spark-cassandra-connector-java_2.10-1.1.1.jar
///home/missingmerch/etl-0.0.1-SNAPSHOT.jar


*Error:*

java.lang.ClassNotFoundException: HelloWorld

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:270)

at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:342)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Previously I could fix the issue by changing the order of arguments passing
in DSE command line interface but now I am not sure why the issue again

Please let me know if still I am missing anything in my Command as
mentioned above(as insisted I have added dse.jar and
spark-cassandra-connector-java_2.10.1.1.1.jar)


Thanks for support


Satish Chandra

On Mon, Aug 10, 2015 at 6:19 PM, Dean Wampler deanwamp...@gmail.com wrote:

 Add the other Cassandra dependencies (dse.jar,
 spark-cassandra-connect-java_2.10) to your --jars argument on the command
 line.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Mon, Aug 10, 2015 at 7:44 AM, satish chandra j 
 jsatishchan...@gmail.com wrote:

 HI All,
 Please help me to fix Spark Cassandra Connector issue, find the details
 below

 *Command:*

 dse spark-submit --master spark://10.246.43.15:7077 --class HelloWorld
 --jars ///home/missingmerch/postgresql-9.4-1201.jdbc41.jar
 ///home/missingmerch/etl-0.0.1-SNAPSHOT.jar


 *Error:*


 WARN  2015-08-10 06:33:35 org.apache.spark.util.Utils: Service 'SparkUI'
 could not bind on port 4040. Attempting port 4041.

 Exception in thread main java.lang.NoSuchMethodError:
 com.datastax.spark.connector.package$.toRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;)Lcom/datastax/spark/connector/RDDFunctions;

 at HelloWorld$.main(HelloWorld.scala:29)

 at HelloWorld.main(HelloWorld.scala)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


 *Code:*

 *import* *org.apache*.spark.SparkContext

 *import* *org.apache*.spark.SparkContext._

 *import* *org.apache*.spark.SparkConf

 *import* *org.apache*.spark.rdd.JdbcRDD

 *import* *com.datastax*.spark.connector._

 *import* com.datastax.spark.connector.cql.CassandraConnector

 *import* com.datastax.bdp.spark.DseSparkConfHelper._

 *import* java.sql.{Connection, DriverManager, ResultSet,
 PreparedStatement, SQLException, Statement}

 *object* HelloWorld {

 *def* main(args: Array[String]) {

   *def* createSparkContext() = {

*val** myJar = 
 *getClass.getProtectionDomain.getCodeSource.getLocation.getPath


*val* conf = *new* SparkConf().set(
 spark.cassandra.connection.host, 10.246.43.15)

.setAppName(First Spark App)

.setMaster(local)

 *   .s*etJars(Array(myJar))

.set(cassandra.username, username)

.set(cassandra.password, password)

.forDse

*new* SparkContext(conf)

 }



   *val* sc = createSparkContext()

   *val* user=hkonak0

   *val** pass=*Winter18

   Class.forName(org.postgresql.Driver).newInstance

   *val* url = jdbc:postgresql://gptester:5432/db_test

   *val* myRDD27 = *new* JdbcRDD( sc, ()=
 DriverManager.getConnection(url,user,pass),select * from
 wmax_vmax.arm_typ_txt LIMIT ? OFFSET ?,5,0,1,(r: ResultSet) =
 {(r.getInt(alarm_type_code),r.getString(language_code),r.getString(
 alrm_type_cd_desc))})

   myRDD27.saveToCassandra(keyspace,arm_typ_txt,SomeColumns(
 alarm_type_code,language_code,alrm_type_cd_desc))

   println(myRDD27.count())

   println(myRDD27.first)

   sc.stop()

   sys.exit()



 }

   }



 *POM

Spark-Cassandra connector DataFrame

2015-07-28 Thread simon wang
Hi,
I would like to get the recommendations to use Spark-Cassandra connector 
DataFrame feature.
I was trying to save a Dataframe containing 8 Million rows to Cassandra through 
the Spark-Cassandra connector. Based on the Spark log, this single action took 
about 60 minutes to complete. I think it was a very slow process.
Are there some configurations I need to check when using this Spark-Cassandra 
connector DataFrame feature?
From the Spark log, I can see saving the Dataframe to Cassandra was performed 
by 200 small steps. Cassandra database was connected and disconnected 4 times 
during the 60 minutes. This number matches the number of nodes the Cassandra 
cluster has.
I understand this feature is Dataframe Experimental, and I am new to both Spark 
and Cassandra. Any suggestions are much appreciated.
Thanks,
Simon Wang


Re: Spark Cassandra connector number of Tasks

2015-05-10 Thread vijaypawnarkar
Looking for help with this. Thank you!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-connector-number-of-Tasks-tp22820p22839.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Cassandra connector number of Tasks

2015-05-08 Thread vijaypawnarkar
I am using the Spark Cassandra connector to work with a table with 3 million
records. Using .where() API to work with only a certain rows in this table.
Where clause filters the data to 1 rows.

CassandraJavaUtil.javaFunctions(sparkContext) .cassandraTable(KEY_SPACE,
MY_TABLE, CassandraJavaUtil.mapRowTo(MyClass.class)).where(cqlDataFilter,
cqlFilterParams) 


Also using parameter spark.cassandra.input.split.size=1000 

As this job is processed by Spark cluster, it created 3000 partitions
instead of 10. On spark cluster 3000 tasks are being executed. As the data
in our table grows to 30 million rows, this will create 30,000 tasks instead
of 10. 

Is there a better way to approach process these 10,000 records with 10
tasks.

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-connector-number-of-Tasks-tp22820.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Cassandra Connector

2015-04-18 Thread DStrip
Hello,

I am facing some difficulties on installing the Cassandra Spark connector.
Specifically I am working on Cassandra 2.0.13 and Spark 1.2.1. I am trying
to build the-create the JAR- for the connection but unfortunately I cannot
see in which file I have to declare the dependencies 

libraryDependencies += com.datastax.spark %% spark-cassandra-connector %
1.2.0-rc3

in order to create the previous jar version of the connector and not the
default one
(i.e. spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar )

I am really new on working with sbt. Any guidance / help would be really
appreciated it.
Thank you very much.
DS





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-Connector-tp22558.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Using the DataStax Cassandra Connector from PySpark

2014-12-26 Thread Stephen Boesch
Did you receive any response on this?  I am trying to load hbase classes
and getting the same error py4j.protocol.Py4JError: Trying to call a
package.  . Even though the $HBASE_HOME/lib/* had already been added to
the compute-classpath.sh

2014-10-21 16:02 GMT-07:00 Mike Sukmanowsky mike.sukmanow...@gmail.com:

 Hi there,

 I'm using Spark 1.1.0 and experimenting with trying to use the DataStax
 Cassandra Connector (https://github.com/datastax/spark-cassandra-connector)
 from within PySpark.

 As a baby step, I'm simply trying to validate that I have access to
 classes that I'd need via Py4J. Sample python program:


 from py4j.java_gateway import java_import

 from pyspark.conf import SparkConf
 from pyspark import SparkContext

 conf = SparkConf().set(spark.cassandra.connection.host, 127.0.0.1)
 sc = SparkContext(appName=Spark + Cassandra Example, conf=conf)
 java_import(sc._gateway.jvm, com.datastax.spark.connector.*)
 print sc._jvm.CassandraRow()




 CassandraRow corresponds to
 https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/CassandraRow.scala
 which is included in the JAR I submit. Feel free to download the JAR here
 https://dl.dropboxusercontent.com/u/4385786/pyspark-cassandra-0.1.0-SNAPSHOT-standalone.jar

 I'm currently running this Python example with:



 spark-submit
 --driver-class-path=/path/to/pyspark-cassandra-0.1.0-SNAPSHOT-standalone.jar
 --verbose src/python/cassandara_example.py



 But continually get the following error indicating that the classes aren't
 in fact on the classpath of the GatewayServer:



 Traceback (most recent call last):
   File
 /Users/mikesukmanowsky/Development/parsely/pyspark-cassandra/src/python/cassandara_example.py,
 line 37, in module
 main()
   File
 /Users/mikesukmanowsky/Development/parsely/pyspark-cassandra/src/python/cassandara_example.py,
 line 25, in main
 print sc._jvm.CassandraRow()
   File
 /Users/mikesukmanowsky/.opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
 line 726, in __getattr__
 py4j.protocol.Py4JError: Trying to call a package.



 The correct response from the GatewayServer should be:


 In [22]: gateway.jvm.CassandraRow()
 Out[22]: JavaObject id=o0



 Also tried using --jars option instead and that doesn't seem to work
 either. Is there something I'm missing as to why the classes aren't
 available?


 --
 Mike Sukmanowsky
 Aspiring Digital Carpenter

 *p*: +1 (416) 953-4248
 *e*: mike.sukmanow...@gmail.com

 facebook http://facebook.com/mike.sukmanowsky | twitter
 http://twitter.com/msukmanowsky | LinkedIn
 http://www.linkedin.com/profile/view?id=10897143 | github
 https://github.com/msukmanowsky




Spark Cassandra Connector proper usage

2014-10-23 Thread Ashic Mahtab
I'm looking to use spark for some ETL, which will mostly consist of update 
statements (a column is a set, that'll be appended to, so a simple insert is 
likely not going to work). As such, it seems like issuing CQL queries to import 
the data is the best option. Using the Spark Cassandra Connector, I see I can 
do this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/1_connecting.md#connecting-manually-to-cassandra
Now I don't want to open a session and close it for every row in the source (am 
I right in not wanting this? Usually, I have one session for the entire 
process, and keep using that in normal apps). However, it says that the 
connector is serializable, but the session is obviously not. So, wrapping the 
whole import inside a single withSessionDo seems like it'll cause problems. I 
was thinking of using something like this:
class CassandraStorage(conf:SparkConf) {
  val session = CassandraConnector(conf).openSession()
  def store (t:Thingy) : Unit = {
//session.execute cql goes here
  }
}

Is this a good approach? Do I need to worry about closing the session? Where / 
how best would I do that? Any pointers are appreciated.
Thanks,Ashic.
  

RE: Spark Cassandra Connector proper usage

2014-10-23 Thread Ashic Mahtab
Hi Gerard,
Thanks for the response. Here's the scenario:

The target cassandra schema looks like this:

create table foo (
   id text primary key,
   bar int,
   things settext
)

The source in question is a Sql Server source providing the necessary data. The 
source goes over the same id multiple times adding things to the things set 
each time. With inserts, it'll replace things with a new set of one element, 
instead of appending that item. As such, the query

update foo set things = things + ? where id=?

solves the problem. If I had to stick with saveToCassasndra, I'd have to read 
in the existing row for each row, and then write it back. Since this is 
happening in parallel on multiple machines, that would likely cause 
discrepancies where a node will read and update to older values. Hence my 
question about session management in order to issue custom update queries.

Thanks,
Ashic.

Date: Thu, 23 Oct 2014 14:27:47 +0200
Subject: Re: Spark Cassandra Connector proper usage
From: gerard.m...@gmail.com
To: as...@live.com

Ashic,

With the Spark-cassandra connector you would typically create an RDD from the 
source table, update what you need,  filter out what you don't update and write 
it back to Cassandra.
Kr, Gerard
On Oct 23, 2014 1:21 PM, Ashic Mahtab as...@live.com wrote:



I'm looking to use spark for some ETL, which will mostly consist of update 
statements (a column is a set, that'll be appended to, so a simple insert is 
likely not going to work). As such, it seems like issuing CQL queries to import 
the data is the best option. Using the Spark Cassandra Connector, I see I can 
do this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/1_connecting.md#connecting-manually-to-cassandra
Now I don't want to open a session and close it for every row in the source (am 
I right in not wanting this? Usually, I have one session for the entire 
process, and keep using that in normal apps). However, it says that the 
connector is serializable, but the session is obviously not. So, wrapping the 
whole import inside a single withSessionDo seems like it'll cause problems. I 
was thinking of using something like this:
class CassandraStorage(conf:SparkConf) {
  val session = CassandraConnector(conf).openSession()
  def store (t:Thingy) : Unit = {
//session.execute cql goes here
  }
}

Is this a good approach? Do I need to worry about closing the session? Where / 
how best would I do that? Any pointers are appreciated.
Thanks,Ashic.
  
  

Re: Spark Cassandra Connector proper usage

2014-10-23 Thread Gerard Maas
Hi Ashic,

At the moment I see two options:

1) You could  use the CassandraConnector object to execute your specialized
query. The recommended pattern is to to that within a
rdd.foreachPartition(...) in order to amortize DB connection setup over the
number of elements in on partition.   Something like this:

val sparkContext = ???
val cassandraConnector = CassandraConnector(conf)
val dataRdd = ??? // I assume this is the source of data
val rddThingById = dataRdd.map(elem = transformToIdByThing(elem) )
 rddThingById.foreachPartition(partition = {
cassandraConnector.withSessionDo{ session =
  partition.foreach(record = session.execute(update foo set things =
things + ? where id=? , record.id, record.thing)
   }
 }

2) You could change your datamodel slightly in order to avoid the update
operation.
Actually, the cassandra representation of a set is nothing more than a
column - timestamp, where the column name is an element of the set.
So Set (a,b,c) = Column(a)- ts, Column(b) - ts, Column(c) - tx

So, if you desugarize your datamodel, you could use something like:
create table foo (
   id text primary key,
   bar int,
   things text,
   ts timestamp,
   primary key ((id, bar), things)
)

And your Spark process would be reduced to:
val sparkContext = ???
val dataRdd = ??? // I assume this is the source of data
dataRdd.map(elem = transformToIdBarThingByTimeStamp(elem)
).saveToCassandra(ks, foo,Columns(id, bar, thing, ts))


Hope this helps.

-kr, Gerard.











On Thu, Oct 23, 2014 at 2:48 PM, Ashic Mahtab as...@live.com wrote:

 Hi Gerard,
 Thanks for the response. Here's the scenario:

 The target cassandra schema looks like this:

 create table foo (
id text primary key,
bar int,
things settext
 )

 The source in question is a Sql Server source providing the necessary
 data. The source goes over the same id multiple times adding things to
 the things set each time. With inserts, it'll replace things with a new
 set of one element, instead of appending that item. As such, the query

 update foo set things = things + ? where id=?

 solves the problem. If I had to stick with saveToCassasndra, I'd have to
 read in the existing row for each row, and then write it back. Since this
 is happening in parallel on multiple machines, that would likely cause
 discrepancies where a node will read and update to older values. Hence my
 question about session management in order to issue custom update queries.

 Thanks,
 Ashic.

 --
 Date: Thu, 23 Oct 2014 14:27:47 +0200
 Subject: Re: Spark Cassandra Connector proper usage
 From: gerard.m...@gmail.com
 To: as...@live.com


 Ashic,
 With the Spark-cassandra connector you would typically create an RDD from
 the source table, update what you need,  filter out what you don't update
 and write it back to Cassandra.

 Kr, Gerard
 On Oct 23, 2014 1:21 PM, Ashic Mahtab as...@live.com wrote:

 I'm looking to use spark for some ETL, which will mostly consist of
 update statements (a column is a set, that'll be appended to, so a simple
 insert is likely not going to work). As such, it seems like issuing CQL
 queries to import the data is the best option. Using the Spark Cassandra
 Connector, I see I can do this:



 https://github.com/datastax/spark-cassandra-connector/blob/master/doc/1_connecting.md#connecting-manually-to-cassandra
 https://github.com/datastax/spark-cassandra-connector/blob/master/doc/1_connecting.md#connecting-manually-to-cassandra
 https://github.com/datastax/spark-cassandra-connector/blob/master/doc/1_connecting.md#connecting-manually-to-cassandra



 https://github.com/datastax/spark-cassandra-connector/blob/master/doc/1_connecting.md#connecting-manually-to-cassandra

 Now I don't want to open a session and close it for every row in the
 source (am I right in not wanting this? Usually, I have one session for the
 entire process, and keep using that in normal apps). However, it says
 that the connector is serializable, but the session is obviously not. So,
 wrapping the whole import inside a single withSessionDo seems like it'll
 cause problems. I was thinking of using something like this:


 class CassandraStorage(conf:SparkConf) {
   val session = CassandraConnector(conf).openSession()
   def store (t:Thingy) : Unit = {
 //session.execute cql goes here
   }
 }


 Is this a good approach? Do I need to worry about closing the session?
 Where / how best would I do that? Any pointers are appreciated.


 Thanks,

 Ashic.




RE: Spark Cassandra Connector proper usage

2014-10-23 Thread Ashic Mahtab
Hi Gerard,
I've gone with option 1, and seems to be working well. Option 2 is also quite 
interesting. Thanks for your help in this.

Regards,
Ashic.

From: gerard.m...@gmail.com
Date: Thu, 23 Oct 2014 17:07:56 +0200
Subject: Re: Spark Cassandra Connector proper usage
To: as...@live.com
CC: user@spark.apache.org

Hi Ashic,
At the moment I see two options: 
1) You could  use the CassandraConnector object to execute your specialized 
query. The recommended pattern is to to that within a rdd.foreachPartition(...) 
in order to amortize DB connection setup over the number of elements in on 
partition.   Something like this:
val sparkContext = ???val cassandraConnector = CassandraConnector(conf) val 
dataRdd = ??? // I assume this is the source of dataval rddThingById = 
dataRdd.map(elem = transformToIdByThing(elem) ) 
rddThingById.foreachPartition(partition = {
cassandraConnector.withSessionDo{ session =   partition.foreach(record = 
session.execute(update foo set things = things + ? where id=? , record.id, 
record.thing)   } }
2) You could change your datamodel slightly in order to avoid the update 
operation.
Actually, the cassandra representation of a set is nothing more than a column 
- timestamp, where the column name is an element of the set.So Set (a,b,c) = 
Column(a)- ts, Column(b) - ts, Column(c) - tx
So, if you desugarize your datamodel, you could use something like:create table 
foo (   id text primary key,   bar int,   things text,   ts timestamp,   
primary key ((id, bar), things))
And your Spark process would be reduced to:val sparkContext = ??? val dataRdd = 
??? // I assume this is the source of datadataRdd.map(elem = 
transformToIdBarThingByTimeStamp(elem) ).saveToCassandra(ks, foo,Columns(id, 
bar, thing, ts))

Hope this helps.
-kr, Gerard.

 



 
 
On Thu, Oct 23, 2014 at 2:48 PM, Ashic Mahtab as...@live.com wrote:



Hi Gerard,
Thanks for the response. Here's the scenario:

The target cassandra schema looks like this:

create table foo (
   id text primary key,
   bar int,
   things settext
)

The source in question is a Sql Server source providing the necessary data. The 
source goes over the same id multiple times adding things to the things set 
each time. With inserts, it'll replace things with a new set of one element, 
instead of appending that item. As such, the query

update foo set things = things + ? where id=?

solves the problem. If I had to stick with saveToCassasndra, I'd have to read 
in the existing row for each row, and then write it back. Since this is 
happening in parallel on multiple machines, that would likely cause 
discrepancies where a node will read and update to older values. Hence my 
question about session management in order to issue custom update queries.

Thanks,
Ashic.

Date: Thu, 23 Oct 2014 14:27:47 +0200
Subject: Re: Spark Cassandra Connector proper usage
From: gerard.m...@gmail.com
To: as...@live.com

Ashic,

With the Spark-cassandra connector you would typically create an RDD from the 
source table, update what you need,  filter out what you don't update and write 
it back to Cassandra.
Kr, Gerard
On Oct 23, 2014 1:21 PM, Ashic Mahtab as...@live.com wrote:



I'm looking to use spark for some ETL, which will mostly consist of update 
statements (a column is a set, that'll be appended to, so a simple insert is 
likely not going to work). As such, it seems like issuing CQL queries to import 
the data is the best option. Using the Spark Cassandra Connector, I see I can 
do this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/1_connecting.md#connecting-manually-to-cassandra
Now I don't want to open a session and close it for every row in the source (am 
I right in not wanting this? Usually, I have one session for the entire 
process, and keep using that in normal apps). However, it says that the 
connector is serializable, but the session is obviously not. So, wrapping the 
whole import inside a single withSessionDo seems like it'll cause problems. I 
was thinking of using something like this:
class CassandraStorage(conf:SparkConf) {
  val session = CassandraConnector(conf).openSession()
  def store (t:Thingy) : Unit = {
//session.execute cql goes here
  }
}

Is this a good approach? Do I need to worry about closing the session? Where / 
how best would I do that? Any pointers are appreciated.
Thanks,Ashic.
  
  

  

Spark Cassandra connector issue

2014-10-21 Thread Ankur Srivastava
Hi,

I am creating a cassandra java rdd and transforming it using the where
clause.

It works fine when I run it outside the mapValues, but when I put the code
in mapValues I get an error while creating the transformation.

Below is my sample code:

  CassandraJavaRDDReferenceData cassandraRefTable = javaFunctions(sc
).cassandraTable(reference_data,

 dept_reference_data, ReferenceData.class);

JavaPairRDDString, Employee joinedRdd = rdd.mapValues(new
FunctionIPLocation, IPLocation() {

 public Employee call(Employee employee) throws Exception {

 ReferenceData data = null;

 if(employee.getDepartment() != null) {

   data = referenceTable.where(postal_plus=?, location
.getPostalPlus()).first();

   System.out.println(data.toCSV());

 }

if(data != null) {

  //call setters on employee

}

return employee;

 }

}

I get this error:

java.lang.NullPointerException

at org.apache.spark.rdd.RDD.init(RDD.scala:125)

at com.datastax.spark.connector.rdd.CassandraRDD.init(
CassandraRDD.scala:47)

at com.datastax.spark.connector.rdd.CassandraRDD.copy(CassandraRDD.scala:70)

at com.datastax.spark.connector.rdd.CassandraRDD.where(CassandraRDD.scala:77
)

 at com.datastax.spark.connector.rdd.CassandraJavaRDD.where(
CassandraJavaRDD.java:54)


Thanks for help!!



Regards

Ankur


Using the DataStax Cassandra Connector from PySpark

2014-10-21 Thread Mike Sukmanowsky
Hi there,

I'm using Spark 1.1.0 and experimenting with trying to use the DataStax
Cassandra Connector (https://github.com/datastax/spark-cassandra-connector)
from within PySpark.

As a baby step, I'm simply trying to validate that I have access to classes
that I'd need via Py4J. Sample python program:


from py4j.java_gateway import java_import

from pyspark.conf import SparkConf
from pyspark import SparkContext

conf = SparkConf().set(spark.cassandra.connection.host, 127.0.0.1)
sc = SparkContext(appName=Spark + Cassandra Example, conf=conf)
java_import(sc._gateway.jvm, com.datastax.spark.connector.*)
print sc._jvm.CassandraRow()




CassandraRow corresponds to
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/CassandraRow.scala
which is included in the JAR I submit. Feel free to download the JAR here
https://dl.dropboxusercontent.com/u/4385786/pyspark-cassandra-0.1.0-SNAPSHOT-standalone.jar

I'm currently running this Python example with:



spark-submit
--driver-class-path=/path/to/pyspark-cassandra-0.1.0-SNAPSHOT-standalone.jar
--verbose src/python/cassandara_example.py



But continually get the following error indicating that the classes aren't
in fact on the classpath of the GatewayServer:



Traceback (most recent call last):
  File
/Users/mikesukmanowsky/Development/parsely/pyspark-cassandra/src/python/cassandara_example.py,
line 37, in module
main()
  File
/Users/mikesukmanowsky/Development/parsely/pyspark-cassandra/src/python/cassandara_example.py,
line 25, in main
print sc._jvm.CassandraRow()
  File
/Users/mikesukmanowsky/.opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
line 726, in __getattr__
py4j.protocol.Py4JError: Trying to call a package.



The correct response from the GatewayServer should be:


In [22]: gateway.jvm.CassandraRow()
Out[22]: JavaObject id=o0



Also tried using --jars option instead and that doesn't seem to work
either. Is there something I'm missing as to why the classes aren't
available?


-- 
Mike Sukmanowsky
Aspiring Digital Carpenter

*p*: +1 (416) 953-4248
*e*: mike.sukmanow...@gmail.com

facebook http://facebook.com/mike.sukmanowsky | twitter
http://twitter.com/msukmanowsky | LinkedIn
http://www.linkedin.com/profile/view?id=10897143 | github
https://github.com/msukmanowsky


Re: Spark Cassandra connector issue

2014-10-21 Thread Ankur Srivastava
Is this because I am calling a transformation function on an rdd from
inside another transformation function?

Is it not allowed?

Thanks
Ankut
On Oct 21, 2014 1:59 PM, Ankur Srivastava ankur.srivast...@gmail.com
wrote:

 Hi Gerard,

 this is the code that may be helpful.

 public class ReferenceDataJoin implements Serializable {


  private static final long serialVersionUID = 1039084794799135747L;

 JavaPairRDDString, Employee rdd;

 CassandraJavaRDDReferenceData referenceTable;


  public PostalReferenceDataJoin(ListEmployee employees) {

  JavaSparkContext sc =
 SparkContextFactory.getSparkContextFactory().getSparkContext();

  this.rdd = sc.parallelizePairs(employees);

  this. referenceTable = javaFunctions(sc).cassandraTable(reference_data,

  “dept_reference_data, ReferenceData.class);

 }


  public JavaPairRDDString, Employee execute() {

 JavaPairRDDString, Employee joinedRdd = rdd

 .mapValues(new FunctionEmployee, Employee() {

 private static final long serialVersionUID = -226016490083377260L;


 @Override

 public Employee call(Employee employee)

 throws Exception {

 ReferenceData data = null;

 if (employee.getDepartment() != null) {

 data = referenceTable.where(“dept=?,

 employee.getDepartment()).first();;

 System.out.println(employee.getDepartment() +  + data);

 }

 if (data != null) {

 //setters on employee

 }

 return employee;

 }

   });

  return joinedRdd;

 }


 }


 Thanks
 Ankur

 On Tue, Oct 21, 2014 at 11:11 AM, Gerard Maas gerard.m...@gmail.com
 wrote:

 Looks like that code does not correspond to the problem you're facing. I
 doubt it would even compile.
 Could you post the actual code?

 -kr, Gerard
 On Oct 21, 2014 7:27 PM, Ankur Srivastava ankur.srivast...@gmail.com
 wrote:

 Hi,

 I am creating a cassandra java rdd and transforming it using the where
 clause.

 It works fine when I run it outside the mapValues, but when I put the
 code in mapValues I get an error while creating the transformation.

 Below is my sample code:

   CassandraJavaRDDReferenceData cassandraRefTable = javaFunctions(sc
 ).cassandraTable(reference_data,

  dept_reference_data, ReferenceData.class);

 JavaPairRDDString, Employee joinedRdd = rdd.mapValues(new
 FunctionIPLocation, IPLocation() {

  public Employee call(Employee employee) throws Exception {

  ReferenceData data = null;

  if(employee.getDepartment() != null) {

data = referenceTable.where(postal_plus=?, location
 .getPostalPlus()).first();

System.out.println(data.toCSV());

  }

 if(data != null) {

   //call setters on employee

 }

 return employee;

  }

 }

 I get this error:

 java.lang.NullPointerException

 at org.apache.spark.rdd.RDD.init(RDD.scala:125)

 at com.datastax.spark.connector.rdd.CassandraRDD.init(
 CassandraRDD.scala:47)

 at com.datastax.spark.connector.rdd.CassandraRDD.copy(
 CassandraRDD.scala:70)

 at com.datastax.spark.connector.rdd.CassandraRDD.where(
 CassandraRDD.scala:77)

  at com.datastax.spark.connector.rdd.CassandraJavaRDD.where(
 CassandraJavaRDD.java:54)


 Thanks for help!!



 Regards

 Ankur





Spark Cassandra Connector Issue and performance

2014-09-24 Thread pouryas
Hey all

I tried spark connector with Cassandra and I ran into a problem that I was
blocked on for couple of weeks. I managed to find a solution to the problem
but I am not sure whether it was a bug of the connector/spark or not. 

I had three tables in Cassandra (Running Cassandra on 5 node cluster) and a
large Spark cluster (5 worker node with each having 32 cores and 240G
Memory). 

When I ran my job which extracts data from S3 and writes to 3 tables in
Cassandra using around 1TB of memory and 160 cores, sometimes my job get
stuck at last few task of a stage...

After playing around for a while I realised that reducing number of cores to
2 per machine (10 Total) made the job stable. I gradually increased the
number of cores and it hanged again once I had about 50 cores total.

I would like to know if anyone else experienced this and if this is
explainable?


On another note I would like to know if people seeing good performance
reading from cassandra using spark as oppose to reading data from HDFS. Kind
of an open question but would like to see how others are using it.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-Connector-Issue-and-performance-tp15005.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Anyone have successful recipe for spark cassandra connector?

2014-09-19 Thread gzoller
I'm running out of options trying to integrate cassandra, spark, and the
spark-cassandra-connector.

I quickly found out just grabbing the latest versions of everything
(drivers, etc.) doesn't work--binary incompatibilities it would seem.

So last I tried using versions of drivers from the
spark-cassandra-connector's build.  Better, but still no dice.
Any successes out there?  I'd really love to use the stack.

If curious my ridiculously trivial example is here: 
https://github.com/gzoller/doesntwork
https://github.com/gzoller/doesntwork  

If you run 'sbt test' you'll get a NoHostAvailableException exception
complaining it tried /10.0.0.194:9042.  I have no idea where that addr came
from.  I was trying to connect to local.

Any ideas appreciated!
Greg



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Anyone-have-successful-recipe-for-spark-cassandra-connector-tp14681.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



problem in using Spark-Cassandra connector

2014-09-11 Thread Karunya Padala

Hi,

I am new to spark.  I  encountered an issue when trying to connect to Cassandra 
using Spark Cassandra connector. Can anyone help me. Following are the details.

1) Following Spark and Cassandra versions I am using on LUbuntu12.0.
i)spark-1.0.2-bin-hadoop2
ii) apache-cassandra-2.0.10

2) In the Cassandra, i created a key space, table and inserted some data.

3)Following libs are specified when starting the spark-shell.
antworks@INHN1I-DW1804:$ spark-shell --jars 
/home/antworks/lib-spark-cassandra/apache-cassandra-clientutil-2.0.10.jar,/home/antworks/lib-spark-cassandra/apache-cassandra-thrift-2.0.10.jar,/home/antworks/lib-spark-cassandra/cassandra-driver-core-2.0.2.jar,/home/antworks/lib-spark-cassandra/guava-15.0.jar,/home/antworks/lib-spark-cassandra/joda-convert-1.6.jar,/home/antworks/lib-spark-cassandra/joda-time-2.3.jar,/home/antworks/lib-spark-cassandra/libthrift-0.9.1.jar,/home/antworks/lib-spark-cassandra/spark-cassandra-connector_2.10-1.0.0-rc3.jar

4) when running the stmt  val rdd = sc.cassandraTable(EmailKeySpace, 
Emails)encountered following issue.

My application connecting to Cassandra and immediately disconnecting and 
throwing java.io.IOException: Table not found: EmailKeySpace.Emails
Here is the stack trace.

scala import com.datastax.spark.connector._
import com.datastax.spark.connector._

scala val rdd = sc.cassandraTable(EmailKeySpace, Emails)
14/09/11 23:06:51 WARN FrameCompressor: Cannot find LZ4 class, you should make 
sure the LZ4 library is in the classpath if you intend to use it. LZ4 
compression will not be available for the protocol.
14/09/11 23:06:51 INFO Cluster: New Cassandra host /172.23.1.68:9042 added
14/09/11 23:06:51 INFO CassandraConnector: Connected to Cassandra cluster: 
AWCluster
14/09/11 23:06:52 INFO CassandraConnector: Disconnected from Cassandra cluster: 
AWCluster
java.io.IOException: Table not found: EmailKeySpace.Emails
at 
com.datastax.spark.connector.rdd.CassandraRDD.tableDef$lzycompute(CassandraRDD.scala:208)
at 
com.datastax.spark.connector.rdd.CassandraRDD.tableDef(CassandraRDD.scala:205)
at 
com.datastax.spark.connector.rdd.CassandraRDD.init(CassandraRDD.scala:212)
at 
com.datastax.spark.connector.SparkContextFunctions.cassandraTable(SparkContextFunctions.scala:48)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:15)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:20)
at $iwC$$iwC$$iwC$$iwC.init(console:22)
at $iwC$$iwC$$iwC.init(console:24)
at $iwC$$iwC.init(console:26)
at $iwC.init(console:28)
at init(console:30)
at .init(console:34)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303

Re: problem in using Spark-Cassandra connector

2014-09-11 Thread Reddy Raja
You will have to create create KeySpace and Table.
See the message,
Table not found: EmailKeySpace.Emails

Looks like you have not created the Emails table.


On Thu, Sep 11, 2014 at 6:04 PM, Karunya Padala 
karunya.pad...@infotech-enterprises.com wrote:



 Hi,



 I am new to spark.  I  encountered an issue when trying to connect to
 Cassandra using Spark Cassandra connector. Can anyone help me. Following
 are the details.



 1) Following Spark and Cassandra versions I am using on LUbuntu12.0.

 i)spark-1.0.2-bin-hadoop2

 ii) apache-cassandra-2.0.10



 2) In the Cassandra, i created a key space, table and inserted some data.



 3)Following libs are specified when starting the spark-shell.

 antworks@INHN1I-DW1804:$ spark-shell --jars
 /home/antworks/lib-spark-cassandra/apache-cassandra-clientutil-2.0.10.jar,/home/antworks/lib-spark-cassandra/apache-cassandra-thrift-2.0.10.jar,/home/antworks/lib-spark-cassandra/cassandra-driver-core-2.0.2.jar,/home/antworks/lib-spark-cassandra/guava-15.0.jar,/home/antworks/lib-spark-cassandra/joda-convert-1.6.jar,/home/antworks/lib-spark-cassandra/joda-time-2.3.jar,/home/antworks/lib-spark-cassandra/libthrift-0.9.1.jar,/home/antworks/lib-spark-cassandra/spark-cassandra-connector_2.10-1.0.0-rc3.jar



 4) when running the stmt  val rdd = sc.cassandraTable(EmailKeySpace,
 Emails)encountered following issue.



 My application connecting to Cassandra and immediately disconnecting and
 throwing java.io.IOException: Table not found: EmailKeySpace.Emails

 Here is the stack trace.



 scala import com.datastax.spark.connector._

 import com.datastax.spark.connector._



 scala val rdd = sc.cassandraTable(EmailKeySpace, Emails)

 14/09/11 23:06:51 WARN FrameCompressor: Cannot find LZ4 class, you should
 make sure the LZ4 library is in the classpath if you intend to use it. LZ4
 compression will not be available for the protocol.

 14/09/11 23:06:51 INFO Cluster: New Cassandra host /172.23.1.68:9042 added

 14/09/11 23:06:51 INFO CassandraConnector: Connected to Cassandra cluster:
 AWCluster

 14/09/11 23:06:52 INFO CassandraConnector: Disconnected from Cassandra
 cluster: AWCluster

 java.io.IOException: Table not found: EmailKeySpace.Emails

 at
 com.datastax.spark.connector.rdd.CassandraRDD.tableDef$lzycompute(CassandraRDD.scala:208)

 at
 com.datastax.spark.connector.rdd.CassandraRDD.tableDef(CassandraRDD.scala:205)

 at
 com.datastax.spark.connector.rdd.CassandraRDD.init(CassandraRDD.scala:212)

 at
 com.datastax.spark.connector.SparkContextFunctions.cassandraTable(SparkContextFunctions.scala:48)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:15)

 at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:20)

 at $iwC$$iwC$$iwC$$iwC.init(console:22)

 at $iwC$$iwC$$iwC.init(console:24)

 at $iwC$$iwC.init(console:26)

 at $iwC.init(console:28)

 at init(console:30)

 at .init(console:34)

 at .clinit(console)

 at .init(console:7)

 at .clinit(console)

 at $print(console)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)

 at
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)

 at
 org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)

 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)

 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)

 at
 org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)

 at
 org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)

 at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)

 at
 org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)

 at
 org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)

 at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)

 at
 org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)

 at
 org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)

 at
 org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)

 at
 scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)

 at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)

 at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)

 at org.apache.spark.repl.Main$.main(Main.scala:31)

 at org.apache.spark.repl.Main.main(Main.scala

RE: problem in using Spark-Cassandra connector

2014-09-11 Thread Karunya Padala
I have created key space called EmailKeySpace’and table called Emails and 
inserted some data in the Cassandra. See my Cassandra console screen shot.


[cid:image001.png@01CFCDEB.8FB55CB0]


Regards,
Karunya.

From: Reddy Raja [mailto:areddyr...@gmail.com]
Sent: 11 September 2014 18:07
To: Karunya Padala
Cc: u...@spark.incubator.apache.org
Subject: Re: problem in using Spark-Cassandra connector

You will have to create create KeySpace and Table.
See the message,
Table not found: EmailKeySpace.Emails

Looks like you have not created the Emails table.


On Thu, Sep 11, 2014 at 6:04 PM, Karunya Padala 
karunya.pad...@infotech-enterprises.commailto:karunya.pad...@infotech-enterprises.com
 wrote:

Hi,

I am new to spark.  I  encountered an issue when trying to connect to Cassandra 
using Spark Cassandra connector. Can anyone help me. Following are the details.

1) Following Spark and Cassandra versions I am using on LUbuntu12.0.
i)spark-1.0.2-bin-hadoop2
ii) apache-cassandra-2.0.10

2) In the Cassandra, i created a key space, table and inserted some data.

3)Following libs are specified when starting the spark-shell.
antworks@INHN1I-DW1804:$ spark-shell --jars 
/home/antworks/lib-spark-cassandra/apache-cassandra-clientutil-2.0.10.jar,/home/antworks/lib-spark-cassandra/apache-cassandra-thrift-2.0.10.jar,/home/antworks/lib-spark-cassandra/cassandra-driver-core-2.0.2.jar,/home/antworks/lib-spark-cassandra/guava-15.0.jar,/home/antworks/lib-spark-cassandra/joda-convert-1.6.jar,/home/antworks/lib-spark-cassandra/joda-time-2.3.jar,/home/antworks/lib-spark-cassandra/libthrift-0.9.1.jar,/home/antworks/lib-spark-cassandra/spark-cassandra-connector_2.10-1.0.0-rc3.jar

4) when running the stmt  val rdd = sc.cassandraTable(EmailKeySpace, 
Emails)encountered following issue.

My application connecting to Cassandra and immediately disconnecting and 
throwing java.io.IOException: Table not found: EmailKeySpace.Emails
Here is the stack trace.

scala import com.datastax.spark.connector._
import com.datastax.spark.connector._

scala val rdd = sc.cassandraTable(EmailKeySpace, Emails)
14/09/11 23:06:51 WARN FrameCompressor: Cannot find LZ4 class, you should make 
sure the LZ4 library is in the classpath if you intend to use it. LZ4 
compression will not be available for the protocol.
14/09/11 23:06:51 INFO Cluster: New Cassandra host 
/172.23.1.68:9042http://172.23.1.68:9042 added
14/09/11 23:06:51 INFO CassandraConnector: Connected to Cassandra cluster: 
AWCluster
14/09/11 23:06:52 INFO CassandraConnector: Disconnected from Cassandra cluster: 
AWCluster
java.io.IOException: Table not found: EmailKeySpace.Emails
at 
com.datastax.spark.connector.rdd.CassandraRDD.tableDef$lzycompute(CassandraRDD.scala:208)
at 
com.datastax.spark.connector.rdd.CassandraRDD.tableDef(CassandraRDD.scala:205)
at 
com.datastax.spark.connector.rdd.CassandraRDD.init(CassandraRDD.scala:212)
at 
com.datastax.spark.connector.SparkContextFunctions.cassandraTable(SparkContextFunctions.scala:48)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:15)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:20)
at $iwC$$iwC$$iwC$$iwC.init(console:22)
at $iwC$$iwC$$iwC.init(console:24)
at $iwC$$iwC.init(console:26)
at $iwC.init(console:28)
at init(console:30)
at .init(console:34)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884

Cassandra connector

2014-09-10 Thread wwilkins
Hi,
I am having difficulty getting the Cassandra connector running within the spark 
shell.

My jars looks like:
[wwilkins@phel-spark-001 logs]$ ls -altr /opt/connector/
total 14588
drwxr-xr-x. 5 root root4096 Sep  9 22:15 ..
-rw-r--r--  1 root root  242809 Sep  9 22:20 
spark-cassandra-connector-master.zip
-rw-r--r--  1 root root  541332 Sep  9 22:20 cassandra-driver-core-2.0.3.jar
-rw-r--r--  1 root root 1855685 Sep  9 22:20 cassandra-thrift-2.0.9.jar
-rw-r--r--  1 root root   30085 Sep  9 22:20 commons-codec-1.2.jar
-rw-r--r--  1 root root  315805 Sep  9 22:20 commons-lang3-3.1.jar
-rw-r--r--  1 root root   60686 Sep  9 22:20 commons-logging-1.1.1.jar
-rw-r--r--  1 root root 2228009 Sep  9 22:20 guava-16.0.1.jar
-rw-r--r--  1 root root  433368 Sep  9 22:20 httpclient-4.2.5.jar
-rw-r--r--  1 root root  227275 Sep  9 22:20 httpcore-4.2.4.jar
-rw-r--r--  1 root root 1222059 Sep  9 22:20 ivy-2.3.0.jar.bak
-rw-r--r--  1 root root   38460 Sep  9 22:20 joda-convert-1.2.jar.bak
-rw-r--r--  1 root root   98818 Sep  9 22:20 joda-convert-1.6.jar
-rw-r--r--  1 root root  581571 Sep  9 22:20 joda-time-2.3.jar
-rw-r--r--  1 root root  217053 Sep  9 22:20 libthrift-0.9.1.jar
-rw-r--r--  1 root root 618 Sep  9 22:20 log4j.properties
-rw-r--r--  1 root root  165505 Sep  9 22:20 lz4-1.2.0.jar
-rw-r--r--  1 root root   85448 Sep  9 22:20 metrics-core-3.0.2.jar
-rw-r--r--  1 root root 1231993 Sep  9 22:20 netty-3.9.0.Final.jar
-rw-r--r--  1 root root   26083 Sep  9 22:20 slf4j-api-1.7.2.jar.bak
-rw-r--r--  1 root root   26084 Sep  9 22:20 slf4j-api-1.7.5.jar
-rw-r--r--  1 root root 1251514 Sep  9 22:20 snappy-java-1.0.5.jar
-rw-r--r--  1 root root  776782 Sep  9 22:20 
spark-cassandra-connector_2.10-1.0.0-beta1.jar
-rw-r--r--  1 root root  997458 Sep  9 22:20 
spark-cassandra-connector_2.10-1.0.0-SNAPSHOT.jar.bak3
-rwxr--r--  1 root root 1113208 Sep  9 22:20 
spark-cassandra-connector_2.10-1.1.0-SNAPSHOT.jar.bak
-rw-r--r--  1 root root 804 Sep  9 22:20 
spark-cassandra-connector_2.10-1.1.0-SNAPSHOT.jar.bak2


I launch the shell with this command:
/data/spark/bin/spark-shell --driver-class-path $(echo /opt/connector/*.jar 
|sed 's/ /:/g')


I run these commands:
sc.stop
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import com.datastax.spark.connector._

val conf = new SparkConf()
conf.set(spark.cassandra.connection.host, 10.208.59.164)
val sc = new SparkContext(conf)
val table = sc.cassandraTable(retail, ordercf)


And I get this problem:
scala val table = sc.cassandraTable(retail, ordercf)
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at 
com.datastax.spark.connector.cql.CassandraConnector$.log(CassandraConnector.scala:145)
at org.apache.spark.Logging$class.logDebug(Logging.scala:63)
at 
com.datastax.spark.connector.cql.CassandraConnector$.logDebug(CassandraConnector.scala:145)
at 
com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createCluster(CassandraConnector.scala:155)
at 
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$4.apply(CassandraConnector.scala:152)
at 
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$4.apply(CassandraConnector.scala:152)
at 
com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:36)
at 
com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:61)
at 
com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:72)
at 
com.datastax.spark.connector.cql.Schema.keyspaces$lzycompute(Schema.scala:111)
at com.datastax.spark.connector.cql.Schema.keyspaces(Schema.scala:110)
at 
com.datastax.spark.connector.cql.Schema.tables$lzycompute(Schema.scala:123)
at com.datastax.spark.connector.cql.Schema.tables(Schema.scala:122)
at 
com.datastax.spark.connector.rdd.CassandraRDD.tableDef$lzycompute(CassandraRDD.scala:196)
at 
com.datastax.spark.connector.rdd.CassandraRDD.tableDef(CassandraRDD.scala:195)
at 
com.datastax.spark.connector.rdd.CassandraRDD.init(CassandraRDD.scala:202)
at 
com.datastax.spark.connector.package$SparkContextFunctions.cassandraTable(package.scala:94)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:22)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:27)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:29)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:31)
at $iwC$$iwC$$iwC$$iwC.init(console:33)
at $iwC$$iwC$$iwC.init(console:35)
at $iwC$$iwC.init(console:37)
at $iwC.init(console:39)
at init(console:41)
at .init(console:45)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console

Re: Cassandra connector

2014-09-10 Thread gtinside
Are you using spark 1.1 ? If yes you would have to update the datastax
cassandra connector code and remove ref to log methods from
CassandraConnector.scala

Regards,
Gaurav



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-connector-tp13896p13897.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Cassandra connector

2014-09-10 Thread Wade Wilkins
Thank you!

I am running spark 1.0 but, your suggestion worked for me.
I rem'ed out all 
//logDebug
In both
CassandraConnector.scala
and
Schema.scala

I am moving again. 

Regards,
Wade

-Original Message-
From: gtinside [mailto:gtins...@gmail.com] 
Sent: Wednesday, September 10, 2014 8:49 AM
To: u...@spark.incubator.apache.org
Subject: Re: Cassandra connector

Are you using spark 1.1 ? If yes you would have to update the datastax 
cassandra connector code and remove ref to log methods from 
CassandraConnector.scala

Regards,
Gaurav



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-connector-tp13896p13897.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark-cassandra-connector 1.0.0-rc5: java.io.NotSerializableException

2014-09-05 Thread Shing Hing Man

Hi,

My version of Spark is 1.0.2.
I am trying to use Spark-cassandra-connector to execute an update csql 
statement inside 
an CassandraConnector(conf).withSessionDo  block :

CassandraConnector(conf).withSessionDo {
  session =
{
  myRdd.foreach {
case (ip, values) = 
  session.execute( {An csql update statement})
  }

}

}

 The above code is in the main method of the driver object. 
 When I run the above code (in local mode), I get an exception :
 
  java.io.NotSerializableException: 
com.datastax.spark.connector.cql.SessionProxy
  
 Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task not serializable: java.io.NotSerializableException: 
com.datastax.spark.connector.cql.SessionProxy
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:772)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$16$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:906)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$16$$anonfun$apply$1.apply(DAGScheduler.scala:903)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$16$$anonfun$apply$1.apply(DAGScheduler.scala:903)
  
  
Is there a way to use an RDD inside an   CassandraConnector(conf).withSessionDo 
 block  ?

Thanks in advance for any assistance !
 
Shing

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to use spark-cassandra-connector in spark-shell?

2014-08-08 Thread chutium
try to add following jars in classpath

xxx/cassandra-all-2.0.6.jar:xxx/cassandra-thrift-2.0.6.jar:xxx/libthrift-0.9.1.jar:xxx/cassandra-driver-spark_2.10-1.0.0-SNAPSHOT.jar:xxx/cassandra-java-driver-2.0.2/cassandra-driver-core-2.0.2.jar:xxx/cassandra-java-driver-2.0.2/cassandra-driver-dse-2.0.2.jar

then in spark-shell

import org.apache.spark.{SparkConf, SparkContext}
val conf = new SparkConf(true).set(cassandra.connection.host,
your-cassandra-host)
val sc = new SparkContext(local[1], cassandra-driver, conf)
import com.datastax.driver.spark._
sc.cassandraTable(db1, table1).select(key).count




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-spark-cassandra-connector-in-spark-shell-tp11757p11781.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to use spark-cassandra-connector in spark-shell?

2014-08-08 Thread Thomas Nieborowski
near bottom:
http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html



On Fri, Aug 8, 2014 at 2:00 AM, chutium teng@gmail.com wrote:

 try to add following jars in classpath


 xxx/cassandra-all-2.0.6.jar:xxx/cassandra-thrift-2.0.6.jar:xxx/libthrift-0.9.1.jar:xxx/cassandra-driver-spark_2.10-1.0.0-SNAPSHOT.jar:xxx/cassandra-java-driver-2.0.2/cassandra-driver-core-2.0.2.jar:xxx/cassandra-java-driver-2.0.2/cassandra-driver-dse-2.0.2.jar

 then in spark-shell

 import org.apache.spark.{SparkConf, SparkContext}
 val conf = new SparkConf(true).set(cassandra.connection.host,
 your-cassandra-host)
 val sc = new SparkContext(local[1], cassandra-driver, conf)
 import com.datastax.driver.spark._
 sc.cassandraTable(db1, table1).select(key).count




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-spark-cassandra-connector-in-spark-shell-tp11757p11781.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
Thomas Nieborowski
510-207-7049 mobile
510-339-1716 home


How to use spark-cassandra-connector in spark-shell?

2014-08-07 Thread Gary Zhao
Hello

Is it possible to use spark-cassandra-connector in spark-shell?

Thanks
Gary


Re: How to use spark-cassandra-connector in spark-shell?

2014-08-07 Thread Andrew Ash
Yes, I've done it before.


On Thu, Aug 7, 2014 at 10:18 PM, Gary Zhao garyz...@gmail.com wrote:

 Hello

 Is it possible to use spark-cassandra-connector in spark-shell?

 Thanks
 Gary



Re: How to use spark-cassandra-connector in spark-shell?

2014-08-07 Thread Gary Zhao
Thanks Andrew. How did you do it?


On Thu, Aug 7, 2014 at 10:20 PM, Andrew Ash and...@andrewash.com wrote:

 Yes, I've done it before.


 On Thu, Aug 7, 2014 at 10:18 PM, Gary Zhao garyz...@gmail.com wrote:

 Hello

 Is it possible to use spark-cassandra-connector in spark-shell?

  Thanks
 Gary





Re: How to use spark-cassandra-connector in spark-shell?

2014-08-07 Thread Andrew Ash
I don't remember the details, but I think it just took adding the
spark-cassandra-connector jar to the spark shell's classpath with --jars or
maybe ADD_JARS and then it worked.


On Thu, Aug 7, 2014 at 10:24 PM, Gary Zhao garyz...@gmail.com wrote:

 Thanks Andrew. How did you do it?


 On Thu, Aug 7, 2014 at 10:20 PM, Andrew Ash and...@andrewash.com wrote:

 Yes, I've done it before.


 On Thu, Aug 7, 2014 at 10:18 PM, Gary Zhao garyz...@gmail.com wrote:

 Hello

 Is it possible to use spark-cassandra-connector in spark-shell?

  Thanks
 Gary






spark-cassandra-connector issue

2014-08-06 Thread Gary Zhao
Hello

I'm trying to modify Spark sample app to integrate with Cassandra, however
I saw exception when submitting the app. Anyone knows why it happens?

Exception in thread main java.lang.NoClassDefFoundError:
com/datastax/spark/connector/rdd/reader/RowReaderFactory
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException:
com.datastax.spark.connector.rdd.reader.RowReaderFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 8 more


Source codes:

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

import org.apache.spark.SparkConf

import com.datastax.spark.connector._


object SimpleApp {

  def main(args: Array[String]) {

val conf = new SparkConf(true)

.set(spark.cassandra.connection.host, 10.20.132.44)

.setAppName(Simple Application)

val logFile = /home/gzhao/spark/spark-1.0.2-bin-hadoop1/README.md //
Should be some file on your system

val sc = new SparkContext(spark://mcs-spark-slave1-staging:7077,
idfa_map, conf)

val rdd = sc.cassandraTable(idfa_map, bcookie_idfa)



val logData = sc.textFile(logFile, 2).cache()

val numAs = logData.filter(line = line.contains(a)).count()

val numBs = logData.filter(line = line.contains(b)).count()

println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))

  }

}