Reading from cassandra store in rdd

2016-05-04 Thread Yasemin Kaya
Hi,

I asked this question datastax group but i want to ask also spark-user
group, someone may face this problem.

I have a data in Cassandra and want to get data to SparkRDD. I got an error
, searched it but nothing changed. Is there anyone can help me to fix it?
I can connect Cassandra and cqlsh there is no problem with them.
I took code from datastax github page

.
My code 
 , my error


I am using
*Cassandra 3.5*
*spark: 1.5.2*

*cassandra-driver : 3.0.0*
*spark-cassandra-connector_2.10-1.5.0.jar*
*spark-cassandra-connector-java_2.10-1.5.0-M1.jar*

Best,
yasemin

-- 
hiç ender hiç


Re: Saving model S3

2016-03-21 Thread Yasemin Kaya
Hi Ted,

I don't understand the issue that you want to learn? Could you be more
clear please?





2016-03-21 15:24 GMT+02:00 Ted Yu <yuzhih...@gmail.com>:

> Was speculative execution enabled ?
>
> Thanks
>
> On Mar 21, 2016, at 6:19 AM, Yasemin Kaya <godo...@gmail.com> wrote:
>
> Hi,
>
> I am using S3 read data also I want to save my model S3. In reading part
> there is no error, but when I save model I am getting this error
> <https://gist.github.com/yaseminn/a22808a9a69a95fbf741>. I tried to
> change the way from s3n to s3a but nothing change, different errors comes.
>
> *reading path*
> s3n://tani-online/weblog/
>
> *model saving path*
> s3n://tani-online/model/
>
> *configuration*
>
> sc.hadoopConfiguration().set("fs.s3.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem");
> sc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", AWS_ACCESS_KEY_ID);
> sc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey",
> AWS_SECRET_ACCESS_KEY);
>
>
> ps: I am using spark-1.6.0-bin-hadoop2.4
>
> Best,
> yasemin
>
> --
> hiç ender hiç
>
>


-- 
hiç ender hiç


Saving model S3

2016-03-21 Thread Yasemin Kaya
Hi,

I am using S3 read data also I want to save my model S3. In reading part
there is no error, but when I save model I am getting this error
. I tried to change
the way from s3n to s3a but nothing change, different errors comes.

*reading path*
s3n://tani-online/weblog/

*model saving path*
s3n://tani-online/model/

*configuration*
sc.hadoopConfiguration().set("fs.s3.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem");
sc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", AWS_ACCESS_KEY_ID);
sc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey",
AWS_SECRET_ACCESS_KEY);


ps: I am using spark-1.6.0-bin-hadoop2.4

Best,
yasemin

-- 
hiç ender hiç


Re: reading file from S3

2016-03-16 Thread Yasemin Kaya
Hi,

Thanx a lot all, I understand my problem comes from *hadoop version* and I
move the spark 1.6.0 *hadoop 2.4 *version and there is no problem.

Best,
yasemin

2016-03-15 17:31 GMT+02:00 Gourav Sengupta <gourav.sengu...@gmail.com>:

> Once again, please use roles, there is no way that you have to specify the
> access keys in the URI under any situation. Please read Amazon
> documentation and they will say the same. The only situation when you use
> the access keys in URI is when you have not read the Amazon documentation :)
>
> Regards,
> Gourav
>
> On Tue, Mar 15, 2016 at 3:22 PM, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
>> There are many solutions to a problem.
>>
>> Also understand that sometimes your situation might be such. For ex what
>> if you are accessing S3 from your Spark job running in your continuous
>> integration server sitting in your data center or may be a box under your
>> desk. And sometimes you are just trying something.
>>
>> Also understand that sometimes you want answers to solve your problem at
>> hand without redirecting you to something else. Understand what you
>> suggested is an appropriate way of doing it, which I myself have proposed
>> before, but that doesn't solve the OP's problem at hand.
>>
>> Regards
>> Sab
>> On 15-Mar-2016 8:27 pm, "Gourav Sengupta" <gourav.sengu...@gmail.com>
>> wrote:
>>
>>> Oh!!! What the hell
>>>
>>> Please never use the URI
>>>
>>> *s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY.*That is a major cause
>>> of pain, security issues, code maintenance issues and ofcourse something
>>> that Amazon strongly suggests that we do not use. Please use roles and you
>>> will not have to worry about security.
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Tue, Mar 15, 2016 at 2:38 PM, Sabarish Sasidharan <
>>> sabarish@gmail.com> wrote:
>>>
>>>> You have a slash before the bucket name. It should be @.
>>>>
>>>> Regards
>>>> Sab
>>>> On 15-Mar-2016 4:03 pm, "Yasemin Kaya" <godo...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am using Spark 1.6.0 standalone and I want to read a txt file from
>>>>> S3 bucket named yasemindeneme and my file name is deneme.txt. But I am
>>>>> getting this error. Here is the simple code
>>>>> <https://gist.github.com/anonymous/6d174f8587f0f3fd2334>
>>>>> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
>>>>> hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
>>>>> /yasemindeneme/deneme.txt
>>>>> at
>>>>> org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
>>>>> at
>>>>> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
>>>>>
>>>>>
>>>>> I try 2 options
>>>>> *sc.hadoopConfiguration() *and
>>>>> *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*
>>>>>
>>>>> Also I did export AWS_ACCESS_KEY_ID= .
>>>>>  export AWS_SECRET_ACCESS_KEY=
>>>>> But there is no change about error.
>>>>>
>>>>> Could you please help me about this issue?
>>>>>
>>>>>
>>>>> --
>>>>> hiç ender hiç
>>>>>
>>>>
>>>
>


-- 
hiç ender hiç


Re: reading file from S3

2016-03-15 Thread Yasemin Kaya
Hi Safak,

I changed the Keys but there is no change.

Best,
yasemin


2016-03-15 12:46 GMT+02:00 Şafak Serdar Kapçı <sska...@gmail.com>:

> Hello Yasemin,
> Maybe your key id or access key has special chars like backslash or
> something. You need to change it.
> Best Regards,
> Safak.
>
> 2016-03-15 12:33 GMT+02:00 Yasemin Kaya <godo...@gmail.com>:
>
>> Hi,
>>
>> I am using Spark 1.6.0 standalone and I want to read a txt file from S3
>> bucket named yasemindeneme and my file name is deneme.txt. But I am getting
>> this error. Here is the simple code
>> <https://gist.github.com/anonymous/6d174f8587f0f3fd2334>
>> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
>> hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
>> /yasemindeneme/deneme.txt
>> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
>> at
>> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
>>
>>
>> I try 2 options
>> *sc.hadoopConfiguration() *and
>> *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*
>>
>> Also I did export AWS_ACCESS_KEY_ID= .
>>  export AWS_SECRET_ACCESS_KEY=
>> But there is no change about error.
>>
>> Could you please help me about this issue?
>>
>>
>> --
>> hiç ender hiç
>>
>
>


-- 
hiç ender hiç


reading file from S3

2016-03-15 Thread Yasemin Kaya
Hi,

I am using Spark 1.6.0 standalone and I want to read a txt file from S3
bucket named yasemindeneme and my file name is deneme.txt. But I am getting
this error. Here is the simple code

Exception in thread "main" java.lang.IllegalArgumentException: Invalid
hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
/yasemindeneme/deneme.txt
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)


I try 2 options
*sc.hadoopConfiguration() *and
*sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*

Also I did export AWS_ACCESS_KEY_ID= .
 export AWS_SECRET_ACCESS_KEY=
But there is no change about error.

Could you please help me about this issue?


-- 
hiç ender hiç


concurrent.RejectedExecutionException

2016-01-23 Thread Yasemin Kaya
Hi all,

I'm using spark 1.5 and getting this error. Could you help i cant
understand?

16/01/23 10:11:59 ERROR TaskSchedulerImpl: Exception in statusUpdate
java.util.concurrent.RejectedExecutionException: Task
org.apache.spark.scheduler.TaskResultGetter$$anon$2@62c72719 rejected from
java.util.concurrent.ThreadPoolExecutor@57f54b52[Terminated, pool size = 0,
active threads = 0, queued tasks = 0, completed tasks = 85]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at
org.apache.spark.scheduler.TaskResultGetter.enqueueSuccessfulTask(TaskResultGetter.scala:49)
at
org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:347)
at
org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:330)
at
org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalBackend.scala:65)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org
$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at
org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org
$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at
org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at
org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at
org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
16/01/23 10:12:00 WARN QueuedThreadPool: 6 threads could not be stopped
16/01/23 10:12:01 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
16/01/23 10:12:03 INFO MemoryStore: MemoryStore cleared
16/01/23 10:12:04 INFO BlockManager: BlockManager stopped
16/01/23 10:12:04 INFO BlockManagerMaster: BlockManagerMaster stopped
16/01/23 10:12:05 INFO SparkContext: Successfully stopped SparkContext
16/01/23 10:12:05 ERROR Executor: Exception in task 2.0 in stage 35.0 (TID
87)
java.io.FileNotFoundException:
/tmp/blockmgr-aed2b250-8893-4b5b-b5ef-91928f5547b6/1f/shuffle_9_2_0.data
(No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
at
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:177)
at
org.apache.spark.util.collection.WritablePartitionedPairCollection$$anon$1.writeNext(WritablePartitionedPairCollection.scala:55)
at
org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:675)
at
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:80)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/01/23 10:12:05 ERROR Executor: Exception in task 1.0 in stage 35.0 (TID
86)
java.io.FileNotFoundException:
/tmp/blockmgr-aed2b250-8893-4b5b-b5ef-91928f5547b6/02/shuffle_9_1_0.data
(No such file or directory)

Re: write new data to mysql

2016-01-08 Thread Yasemin Kaya
Hi,
There is no write function that Todd mentioned or i cant find it.
The code and error are in gist
<https://gist.github.com/yaseminn/f5a2b78b126df71dfd0b>. Could you check it
out please?

Best,
yasemin

2016-01-08 18:23 GMT+02:00 Todd Nist <tsind...@gmail.com>:

> It is not clear from the information provided why the insertIntoJDBC
> failed in #2.  I would note that method on the DataFrame as been deprecated
> since 1.4, not sure what version your on.  You should be able to do
> something like this:
>
>  DataFrame.write.mode(SaveMode.Append).jdbc(MYSQL_CONNECTION_URL_WRITE,
> "track_on_alarm", connectionProps)
>
> HTH.
>
> -Todd
>
> On Fri, Jan 8, 2016 at 10:53 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Which Spark release are you using ?
>>
>> For case #2, was there any error / clue in the logs ?
>>
>> Cheers
>>
>> On Fri, Jan 8, 2016 at 7:36 AM, Yasemin Kaya <godo...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to write dataframe existing mysql table, but when i use
>>> *peopleDataFrame.insertIntoJDBC(MYSQL_CONNECTION_URL_WRITE,
>>> "track_on_alarm",false)*
>>>
>>> it says "Table track_on_alarm already exists."
>>>
>>> And when i *use peopleDataFrame.insertIntoJDBC(MYSQL_CONNECTION_URL_WRITE,
>>> "track_on_alarm",true)*
>>>
>>> i lost the existing data.
>>>
>>> How i can write new data to db?
>>>
>>> Best,
>>> yasemin
>>>
>>> --
>>> hiç ender hiç
>>>
>>
>>
>


-- 
hiç ender hiç


Re: write new data to mysql

2016-01-08 Thread Yasemin Kaya
When i change the version to 1.6.0, it worked.
Thanks.

2016-01-08 21:27 GMT+02:00 Yasemin Kaya <godo...@gmail.com>:

> Hi,
> There is no write function that Todd mentioned or i cant find it.
> The code and error are in gist
> <https://gist.github.com/yaseminn/f5a2b78b126df71dfd0b>. Could you check
> it out please?
>
> Best,
> yasemin
>
> 2016-01-08 18:23 GMT+02:00 Todd Nist <tsind...@gmail.com>:
>
>> It is not clear from the information provided why the insertIntoJDBC
>> failed in #2.  I would note that method on the DataFrame as been deprecated
>> since 1.4, not sure what version your on.  You should be able to do
>> something like this:
>>
>>  DataFrame.write.mode(SaveMode.Append).jdbc(MYSQL_CONNECTION_URL_WRITE,
>> "track_on_alarm", connectionProps)
>>
>> HTH.
>>
>> -Todd
>>
>> On Fri, Jan 8, 2016 at 10:53 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Which Spark release are you using ?
>>>
>>> For case #2, was there any error / clue in the logs ?
>>>
>>> Cheers
>>>
>>> On Fri, Jan 8, 2016 at 7:36 AM, Yasemin Kaya <godo...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to write dataframe existing mysql table, but when i use
>>>> *peopleDataFrame.insertIntoJDBC(MYSQL_CONNECTION_URL_WRITE,
>>>> "track_on_alarm",false)*
>>>>
>>>> it says "Table track_on_alarm already exists."
>>>>
>>>> And when i *use peopleDataFrame.insertIntoJDBC(MYSQL_CONNECTION_URL_WRITE,
>>>> "track_on_alarm",true)*
>>>>
>>>> i lost the existing data.
>>>>
>>>> How i can write new data to db?
>>>>
>>>> Best,
>>>> yasemin
>>>>
>>>> --
>>>> hiç ender hiç
>>>>
>>>
>>>
>>
>
>
> --
> hiç ender hiç
>



-- 
hiç ender hiç


write new data to mysql

2016-01-08 Thread Yasemin Kaya
Hi,

I want to write dataframe existing mysql table, but when i use
*peopleDataFrame.insertIntoJDBC(MYSQL_CONNECTION_URL_WRITE,
"track_on_alarm",false)*

it says "Table track_on_alarm already exists."

And when i *use peopleDataFrame.insertIntoJDBC(MYSQL_CONNECTION_URL_WRITE,
"track_on_alarm",true)*

i lost the existing data.

How i can write new data to db?

Best,
yasemin

-- 
hiç ender hiç


Struggling time by data

2015-12-25 Thread Yasemin Kaya
hi,

I have struggled this data couple of days, i cant find solution. Could you
help me?

*DATA:*
*(userid1_time, url) *
*(userid1_time2, url2)*


I want to get url which are in 30 min.

*RESULT:*
*If time2-time1<30 min*
*(user1, [url1, url2] )*

Best,
yasemin
-- 
hiç ender hiç


Re: Struggling time by data

2015-12-25 Thread Yasemin Kaya
it is ok but . I want to categorize the urls by sessions actually.

*DATA:* (sorted by time)
*(userid1_time, url1) *
*(userid1_time2, url2)*
*(userid1_time3, url3) *
*(userid1_time4, url4)*

*RESULT: *
*url1 *already added to* session1*
*time2-time1 < 30 min *so* url2 *go to* session1*
*time3-time2 > 30 min *so* url3 *goes to* session2*
*time4-time3 <30 min *so *url4* goes to* session3*

*(user1, [url1, url2] [url3,url4])*

Does your solution fit my problem?

2015-12-25 12:23 GMT+02:00 Xingchi Wang <regrec...@gmail.com>:

> map{case(x, y) => s = x.split("_"), (s(0), (s(1),
> y)))}.groupByKey().filter{case (_, (a, b)) => abs(a._1, a._1) < 30min}
>
> does it work for you ?
>
> 2015-12-25 16:53 GMT+08:00 Yasemin Kaya <godo...@gmail.com>:
>
>> hi,
>>
>> I have struggled this data couple of days, i cant find solution. Could
>> you help me?
>>
>> *DATA:*
>> *(userid1_time, url) *
>> *(userid1_time2, url2)*
>>
>>
>> I want to get url which are in 30 min.
>>
>> *RESULT:*
>> *If time2-time1<30 min*
>> *(user1, [url1, url2] )*
>>
>> Best,
>> yasemin
>> --
>> hiç ender hiç
>>
>
>


-- 
hiç ender hiç


rdd split into new rdd

2015-12-23 Thread Yasemin Kaya
Hi,

I have data
*JavaPairRDD> *format. In example:

*(1610, {a=1, b=1, c=2, d=2}) *

I want to get
*JavaPairRDD* In example:


*(1610, {a, b})*
*(1610, {c, d})*

Is there a way to solve this problem?

Best,
yasemin
-- 
hiç ender hiç


Re: rdd split into new rdd

2015-12-23 Thread Yasemin Kaya
How can i use mapPartion? Could u give me an example?

2015-12-23 17:26 GMT+02:00 Stéphane Verlet <kaweahsoluti...@gmail.com>:

> You should be able to do that using mapPartition
>
> On Wed, Dec 23, 2015 at 8:24 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> bq. {a=1, b=1, c=2, d=2}
>>
>> Can you elaborate your criteria a bit more ? The above seems to be a Set,
>> not a Map.
>>
>> Cheers
>>
>> On Wed, Dec 23, 2015 at 7:11 AM, Yasemin Kaya <godo...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have data
>>> *JavaPairRDD<String, TreeMap<String, Integer>> *format. In example:
>>>
>>> *(1610, {a=1, b=1, c=2, d=2}) *
>>>
>>> I want to get
>>> *JavaPairRDD<String, List>* In example:
>>>
>>>
>>> *(1610, {a, b})*
>>> *(1610, {c, d})*
>>>
>>> Is there a way to solve this problem?
>>>
>>> Best,
>>> yasemin
>>> --
>>> hiç ender hiç
>>>
>>
>>
>


-- 
hiç ender hiç


groupByKey()

2015-12-08 Thread Yasemin Kaya
Hi,

Sorry for the long inputs but it is my situation.

i have two list and i wana grupbykey them but some value of list disapear.
i can't understand this.

(8867989628612931721,[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

(8867989628612931721,[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,* 1*, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

result of groupbykey
(8867989628612931721,[[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

rdd conversion

2015-10-26 Thread Yasemin Kaya
Hi,

I have *JavaRDD>>* and I want to
convert every map to pairrdd, i mean
* JavaPairRDD. *

There is a loop in list to get the indexed map, when I write code below, it
returns me only one rdd.

JavaPairRDD mapToRDD =
 IdMapValues.mapToPair(new
PairFunction>, Integer,
ArrayList>() {

@Override
public Tuple2 call(
List> arg0)
throws Exception {
Tuple2 t = null;
for(int i=0; i entry :arg0.get(i).entrySet()) {
t = new Tuple2  (entry.getKey(),
entry.getValue());
}
}

return t;
}
});

As you can see i am using java. Give me some clue .. Thanks.

Best,
yasemin

-- 
hiç ender hiç


Re: rdd conversion

2015-10-26 Thread Yasemin Kaya
But if I put the return inside loop, method still wants me a return
statement.

2015-10-26 19:09 GMT+02:00 Ted Yu <yuzhih...@gmail.com>:

> bq.  t = new Tuple2 <Integer, ArrayList> (entry.getKey(),
> entry.getValue());
>
> The return statement is outside the loop.
> That was why you got one RDD.
>
> On Mon, Oct 26, 2015 at 9:40 AM, Yasemin Kaya <godo...@gmail.com> wrote:
>
>> Hi,
>>
>> I have *JavaRDD<List<Map<Integer, ArrayList>>>* and I want to
>> convert every map to pairrdd, i mean
>> * JavaPairRDD<Integer,ArrayList>. *
>>
>> There is a loop in list to get the indexed map, when I write code below,
>> it returns me only one rdd.
>>
>> JavaPairRDD<Integer,ArrayList> mapToRDD =
>>  IdMapValues.mapToPair(new
>> PairFunction<List<Map<Integer,ArrayList>>, Integer,
>> ArrayList>() {
>>
>> @Override
>> public Tuple2<Integer, ArrayList> call(
>> List<Map<Integer, ArrayList>> arg0)
>> throws Exception {
>> Tuple2<Integer, ArrayList> t = null;
>> for(int i=0; i<arg0.size(); ++i){
>> for (Map.Entry<Integer, ArrayList> entry :arg0.get(i).entrySet())
>> {
>> t = new Tuple2 <Integer, ArrayList> (entry.getKey(),
>> entry.getValue());
>> }
>> }
>>
>> return t;
>> }
>> });
>>
>> As you can see i am using java. Give me some clue .. Thanks.
>>
>> Best,
>> yasemin
>>
>> --
>> hiç ender hiç
>>
>
>


-- 
hiç ender hiç


Model exports PMML (Random Forest)

2015-10-07 Thread Yasemin Kaya
Hi,

I want to export my model to PMML. But there is no development about random
forest. It is planned to 1.6 version. Is it possible producing my model
(random forest) PMML xml format manuelly? Thanks.

Best,
yasemin
-- 
hiç ender hiç


ML Pipeline

2015-09-28 Thread Yasemin Kaya
Hi,

I am using Spar 1.5 and ML Pipeline. I create the model then give the model
unlabeled data to find the probabilites and predictions. When I want to see
the results, it returns me error.

//creating model
final PipelineModel model = pipeline.fit(trainingData);

JavaRDD rowRDD1 = unlabeledTest
.map(new Function, Row>() {

@Override
public Row call(Tuple2 arg0)
throws Exception {
return RowFactory.create(arg0._1(), arg0._2());
}
});
// creating dataframe from row
DataFrame production = sqlContext.createDataFrame(rowRDD1,
new StructType(new StructField[] {
new StructField("id", DataTypes.StringType, false,
Metadata.empty()),
new StructField("features", (new VectorUDT()), false,
Metadata.empty()) }));

DataFrame predictionsProduction = model.transform(production);
*//produces the error*
*predictionsProduction.select("id","features","probability").show(5);*

Here is my code, am I wrong at creating rowRDD1 or production ?
error : java.util.NoSuchElementException: key not found: 1.0
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:58)
.

How can I solve it ? Thanks.

Have a nice day,
yasemin

-- 
hiç ender hiç


Re: spark 1.5, ML Pipeline Decision Tree Dataframe Problem

2015-09-18 Thread Yasemin Kaya
Thanks, I try to make but i can't.
JavaPairRDD<String, Vector> unlabeledTest, the vector is Dence vector. I
add import org.apache.spark.sql.SQLContext.implicits$   but there is no
method toDf(), I am using Java not Scala.

2015-09-18 20:02 GMT+03:00 Feynman Liang <fli...@databricks.com>:

> What is the type of unlabeledTest?
>
> SQL should be using the VectorUDT we've defined for Vectors
> <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala#L183>
>  so
> you should be able to just "import sqlContext.implicits._" and then call
> "rdd.toDf()" on your RDD to convert it into a dataframe.
>
> On Fri, Sep 18, 2015 at 7:32 AM, Yasemin Kaya <godo...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using *spark 1.5, ML Pipeline Decision Tree
>> <http://spark.apache.org/docs/latest/ml-decision-tree.html#output-columns>*
>> to get tree's probability. But I have to convert my data to Dataframe type.
>> While creating model there is no problem but when I am using model on my
>> data there is a problem about converting to data frame type. My data type
>> is *JavaPairRDD<String, Vector>* , when I am creating dataframe
>>
>> DataFrame production = sqlContext.createDataFrame(
>> unlabeledTest.values(), Vector.class);
>>
>> *Error says me: *
>> Exception in thread "main" java.lang.ClassCastException:
>> org.apache.spark.mllib.linalg.VectorUDT cannot be cast to
>> org.apache.spark.sql.types.StructType
>>
>> I know if I give LabeledPoint type, there will be no problem. But the
>> data have no label, I wanna predict the label because of this reason I use
>> model on it.
>>
>> Is there way to handle my problem?
>> Thanks.
>>
>>
>> Best,
>> yasemin
>> --
>> hiç ender hiç
>>
>
>


-- 
hiç ender hiç


spark 1.5, ML Pipeline Decision Tree Dataframe Problem

2015-09-18 Thread Yasemin Kaya
Hi,

I am using *spark 1.5, ML Pipeline Decision Tree
*
to get tree's probability. But I have to convert my data to Dataframe type.
While creating model there is no problem but when I am using model on my
data there is a problem about converting to data frame type. My data type
is *JavaPairRDD* , when I am creating dataframe

DataFrame production = sqlContext.createDataFrame(
unlabeledTest.values(), Vector.class);

*Error says me: *
Exception in thread "main" java.lang.ClassCastException:
org.apache.spark.mllib.linalg.VectorUDT cannot be cast to
org.apache.spark.sql.types.StructType

I know if I give LabeledPoint type, there will be no problem. But the data
have no label, I wanna predict the label because of this reason I use model
on it.

Is there way to handle my problem?
Thanks.


Best,
yasemin
-- 
hiç ender hiç


Re: Random Forest MLlib

2015-09-15 Thread Yasemin Kaya
Hi Maximo,

Is there a way getting precision and recall from pipeline? In MLlib version
I get precision and recall metrics from MulticlassMetrics but ML pipeLine
says only testErr.

Thanks
yasemin

2015-09-10 17:47 GMT+03:00 Yasemin Kaya <godo...@gmail.com>:

> Hi Maximo,
> Thanks alot..
> Hi Yasemin,
>We had the same question and found this:
>
> https://issues.apache.org/jira/browse/SPARK-6884
>
> Thanks,
>Maximo
>
> On Sep 10, 2015, at 9:09 AM, Yasemin Kaya <godo...@gmail.com> wrote:
>
> Hi ,
>
> I am using Random Forest Alg. for recommendation system. I get users and
> users' response yes or no (1/0). But I want to learn the probability of the
> trees. Program says x user yes but with how much probability, I want to get
> these probabilities.
>
> Best,
> yasemin
> --
> hiç ender hiç
>
>
>


-- 
hiç ender hiç


Multilabel classification support

2015-09-11 Thread Yasemin Kaya
Hi,

I want to use Mllib for multilabel classification, but I find
http://spark.apache.org/docs/latest/mllib-classification-regression.html,
it is not what I mean. Is there a way to use  multilabel classification?
Thanks alot.

Best,
yasemin

-- 
hiç ender hiç


Random Forest MLlib

2015-09-10 Thread Yasemin Kaya
Hi ,

I am using Random Forest Alg. for recommendation system. I get users and
users' response yes or no (1/0). But I want to learn the probability of the
trees. Program says x user yes but with how much probability, I want to get
these probabilities.

Best,
yasemin
-- 
hiç ender hiç


Re: Random Forest MLlib

2015-09-10 Thread Yasemin Kaya
Hi Maximo,
Thanks alot..
Hi Yasemin,
   We had the same question and found this:

https://issues.apache.org/jira/browse/SPARK-6884

Thanks,
   Maximo

On Sep 10, 2015, at 9:09 AM, Yasemin Kaya <godo...@gmail.com> wrote:

Hi ,

I am using Random Forest Alg. for recommendation system. I get users and
users' response yes or no (1/0). But I want to learn the probability of the
trees. Program says x user yes but with how much probability, I want to get
these probabilities.

Best,
yasemin
-- 
hiç ender hiç


Re: EC2 cluster doesn't work saveAsTextFile

2015-08-10 Thread Yasemin Kaya
Thanx Dean, i am giving unique output path and in every time i also delete
the directory before i run the job.

2015-08-10 15:30 GMT+03:00 Dean Wampler deanwamp...@gmail.com:

 Following Hadoop conventions, Spark won't overwrite an existing directory.
 You need to provide a unique output path every time you run the program, or
 delete or rename the target directory before you run the job.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Mon, Aug 10, 2015 at 7:08 AM, Yasemin Kaya godo...@gmail.com wrote:

 Hi,

 I have EC2 cluster, and am using spark 1.3, yarn and HDFS . When i submit
 at local there is no problem , but i run at cluster, saveAsTextFile doesn't
 work.*It says me User class threw exception: Output directory
 hdfs://172.31.42.10:54310/./weblogReadResult
 http://172.31.42.10:54310/./weblogReadResult already exists*

 Is there anyone can help me about this issue ?

 Best,
 yasemin



 --
 hiç ender hiç





-- 
hiç ender hiç


EC2 cluster doesn't work saveAsTextFile

2015-08-10 Thread Yasemin Kaya
Hi,

I have EC2 cluster, and am using spark 1.3, yarn and HDFS . When i submit
at local there is no problem , but i run at cluster, saveAsTextFile doesn't
work.*It says me User class threw exception: Output directory
hdfs://172.31.42.10:54310/./weblogReadResult
http://172.31.42.10:54310/./weblogReadResult already exists*

Is there anyone can help me about this issue ?

Best,
yasemin



-- 
hiç ender hiç


java.lang.ClassNotFoundException

2015-08-08 Thread Yasemin Kaya
Hi,

I have a little spark program and i am getting an error why i dont
understand.
My code is https://gist.github.com/yaseminn/522a75b863ad78934bc3.
I am using spark 1.3
Submitting : bin/spark-submit --class MonthlyAverage --master local[4]
weather.jar


error:

~/spark-1.3.1-bin-hadoop2.4$ bin/spark-submit --class MonthlyAverage
--master local[4] weather.jar
java.lang.ClassNotFoundException: MonthlyAverage
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties


Please help me Asap..

yasemin
-- 
hiç ender hiç


Re: java.lang.ClassNotFoundException

2015-08-08 Thread Yasemin Kaya
Thanx Ted, i solved it :)

2015-08-08 14:07 GMT+03:00 Ted Yu yuzhih...@gmail.com:

 Have you tried including package name in the class name ?

 Thanks



 On Aug 8, 2015, at 12:00 AM, Yasemin Kaya godo...@gmail.com wrote:

 Hi,

 I have a little spark program and i am getting an error why i dont
 understand.
 My code is https://gist.github.com/yaseminn/522a75b863ad78934bc3.
 I am using spark 1.3
 Submitting : bin/spark-submit --class MonthlyAverage --master local[4]
 weather.jar


 error:

 ~/spark-1.3.1-bin-hadoop2.4$ bin/spark-submit --class MonthlyAverage
 --master local[4] weather.jar
 java.lang.ClassNotFoundException: MonthlyAverage
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:274)
 at
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties


 Please help me Asap..

 yasemin
 --
 hiç ender hiç




-- 
hiç ender hiç


Re: Amazon DynamoDB Spark

2015-08-07 Thread Yasemin Kaya
Thanx Jay.

2015-08-07 19:25 GMT+03:00 Jay Vyas jayunit100.apa...@gmail.com:

 In general the simplest way is that you can use the Dynamo Java API as is
 and call it inside  a map(), and use the asynchronous put() Dynamo api call
 .


  On Aug 7, 2015, at 9:08 AM, Yasemin Kaya godo...@gmail.com wrote:
 
  Hi,
 
  Is there a way using DynamoDB in spark application? I have to persist my
 results to DynamoDB.
 
  Thanx,
  yasemin
 
  --
  hiç ender hiç




-- 
hiç ender hiç


Amazon DynamoDB Spark

2015-08-07 Thread Yasemin Kaya
Hi,

Is there a way using DynamoDB in spark application? I have to persist my
results to DynamoDB.

Thanx,
yasemin

-- 
hiç ender hiç


Broadcast value

2015-06-12 Thread Yasemin Kaya
Hi,

I am taking Broadcast value from file. I want to use it creating Rating
Object (ALS) .
But I am getting null. Here is my code
https://gist.github.com/yaseminn/d6afd0263f6db6ea4ec5 :

At lines 17  18 is ok but 19 returns null so 21 returns me error. Why I
don't understand.Do you have any idea ?


Best,
yasemin



-- 
hiç ender hiç


Re: Cassandra Submit

2015-06-09 Thread Yasemin Kaya
I couldn't find any solution. I can write but I can't read from Cassandra.

2015-06-09 8:52 GMT+03:00 Yasemin Kaya godo...@gmail.com:

 Thanks alot Mohammed, Gerard and Yana.
 I can write to table, but exception returns me. It says *Exception in
 thread main java.io.IOException: Failed to open thrift connection to
 Cassandra at 127.0.0.1:9160 http://127.0.0.1:9160*

 In yaml file :
 rpc_address: localhost
 rpc_port: 9160

 And at project :

 .set(spark.cassandra.connection.host, 127.0.0.1)
 .set(spark.cassandra.connection.rpc.port, 9160);

 or

 .set(spark.cassandra.connection.host, localhost)
 .set(spark.cassandra.connection.rpc.port, 9160);

 whatever I write setting,  I get same exception. Any help ??


 2015-06-08 18:23 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 yes, whatever you put for listen_address in cassandra.yaml. Also, you
 should try to connect to your cassandra cluster via bin/cqlsh to make sure
 you have connectivity before you try to make a a connection via spark.

 On Mon, Jun 8, 2015 at 4:43 AM, Yasemin Kaya godo...@gmail.com wrote:

 Hi,
 I run my project on local. How can find ip address of my cassandra host
 ? From cassandra.yaml or ??

 yasemin

 2015-06-08 11:27 GMT+03:00 Gerard Maas gerard.m...@gmail.com:

 ? = ip address of your cassandra host

 On Mon, Jun 8, 2015 at 10:12 AM, Yasemin Kaya godo...@gmail.com
 wrote:

 Hi ,

 How can I find spark.cassandra.connection.host? And what should I
 change ? Should I change cassandra.yaml ?

 Error says me *Exception in thread main java.io.IOException:
 Failed to open native connection to Cassandra at {127.0.1.1}:9042*

 What should I add *SparkConf sparkConf = new
 SparkConf().setAppName(JavaApiDemo).set(**spark.driver.allowMultipleContexts,
 true).set(spark.cassandra.connection.host, ?);*

 Best
 yasemin

 2015-06-06 3:04 GMT+03:00 Mohammed Guller moham...@glassbeam.com:

  Check your spark.cassandra.connection.host setting. It should be
 pointing to one of your Cassandra nodes.



 Mohammed



 *From:* Yasemin Kaya [mailto:godo...@gmail.com]
 *Sent:* Friday, June 5, 2015 7:31 AM
 *To:* user@spark.apache.org
 *Subject:* Cassandra Submit



 Hi,



 I am using cassandraDB in my project. I had that error *Exception in
 thread main java.io.IOException: Failed to open native connection to
 Cassandra at {127.0.1.1}:9042*



 I think I have to modify the submit line. What should I add or remove
 when I submit my project?



 Best,

 yasemin





 --

 hiç ender hiç




 --
 hiç ender hiç





 --
 hiç ender hiç





 --
 hiç ender hiç




-- 
hiç ender hiç


Re: Cassandra Submit

2015-06-09 Thread Yasemin Kaya
Yes my cassandra is listening on 9160 I think. Actually I know from yaml
file. The file includes :

rpc_address: localhost
# port for Thrift to listen for clients on
rpc_port: 9160

I check the port nc -z localhost 9160; echo $? it returns me 0. I think
it close, should I open this port ?

2015-06-09 16:55 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 Is your cassandra installation actually listening on 9160?

 lsof -i :9160COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
 java29232 ykadiysk   69u  IPv4 42152497  0t0  TCP localhost:9160 
 (LISTEN)

 ​
 I am running an out-of-the box cassandra conf where

 rpc_address: localhost
 # port for Thrift to listen for clients on
 rpc_port: 9160



 On Tue, Jun 9, 2015 at 7:36 AM, Yasemin Kaya godo...@gmail.com wrote:

 I couldn't find any solution. I can write but I can't read from
 Cassandra.

 2015-06-09 8:52 GMT+03:00 Yasemin Kaya godo...@gmail.com:

 Thanks alot Mohammed, Gerard and Yana.
 I can write to table, but exception returns me. It says *Exception in
 thread main java.io.IOException: Failed to open thrift connection to
 Cassandra at 127.0.0.1:9160 http://127.0.0.1:9160*

 In yaml file :
 rpc_address: localhost
 rpc_port: 9160

 And at project :

 .set(spark.cassandra.connection.host, 127.0.0.1)
 .set(spark.cassandra.connection.rpc.port, 9160);

 or

 .set(spark.cassandra.connection.host, localhost)
 .set(spark.cassandra.connection.rpc.port, 9160);

 whatever I write setting,  I get same exception. Any help ??


 2015-06-08 18:23 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 yes, whatever you put for listen_address in cassandra.yaml. Also, you
 should try to connect to your cassandra cluster via bin/cqlsh to make sure
 you have connectivity before you try to make a a connection via spark.

 On Mon, Jun 8, 2015 at 4:43 AM, Yasemin Kaya godo...@gmail.com wrote:

 Hi,
 I run my project on local. How can find ip address of my cassandra
 host ? From cassandra.yaml or ??

 yasemin

 2015-06-08 11:27 GMT+03:00 Gerard Maas gerard.m...@gmail.com:

 ? = ip address of your cassandra host

 On Mon, Jun 8, 2015 at 10:12 AM, Yasemin Kaya godo...@gmail.com
 wrote:

 Hi ,

 How can I find spark.cassandra.connection.host? And what should I
 change ? Should I change cassandra.yaml ?

 Error says me *Exception in thread main java.io.IOException:
 Failed to open native connection to Cassandra at {127.0.1.1}:9042*

 What should I add *SparkConf sparkConf = new
 SparkConf().setAppName(JavaApiDemo).set(**spark.driver.allowMultipleContexts,
 true).set(spark.cassandra.connection.host, ?);*

 Best
 yasemin

 2015-06-06 3:04 GMT+03:00 Mohammed Guller moham...@glassbeam.com:

  Check your spark.cassandra.connection.host setting. It should be
 pointing to one of your Cassandra nodes.



 Mohammed



 *From:* Yasemin Kaya [mailto:godo...@gmail.com]
 *Sent:* Friday, June 5, 2015 7:31 AM
 *To:* user@spark.apache.org
 *Subject:* Cassandra Submit



 Hi,



 I am using cassandraDB in my project. I had that error *Exception
 in thread main java.io.IOException: Failed to open native connection 
 to
 Cassandra at {127.0.1.1}:9042*



 I think I have to modify the submit line. What should I add or
 remove when I submit my project?



 Best,

 yasemin





 --

 hiç ender hiç




 --
 hiç ender hiç





 --
 hiç ender hiç





 --
 hiç ender hiç




 --
 hiç ender hiç





-- 
hiç ender hiç


Re: Cassandra Submit

2015-06-09 Thread Yasemin Kaya
Sorry my answer I hit terminal lsof -i:9160: result is

lsof -i:9160
COMMAND  PIDUSER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java7597 inosens  101u  IPv4  85754  0t0  TCP localhost:9160
(LISTEN)

so 9160 port is available or not ?

2015-06-09 17:16 GMT+03:00 Yasemin Kaya godo...@gmail.com:

 Yes my cassandra is listening on 9160 I think. Actually I know from yaml
 file. The file includes :

 rpc_address: localhost
 # port for Thrift to listen for clients on
 rpc_port: 9160

 I check the port nc -z localhost 9160; echo $? it returns me 0. I
 think it close, should I open this port ?

 2015-06-09 16:55 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 Is your cassandra installation actually listening on 9160?

 lsof -i :9160COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
 java29232 ykadiysk   69u  IPv4 42152497  0t0  TCP localhost:9160 
 (LISTEN)

 ​
 I am running an out-of-the box cassandra conf where

 rpc_address: localhost
 # port for Thrift to listen for clients on
 rpc_port: 9160



 On Tue, Jun 9, 2015 at 7:36 AM, Yasemin Kaya godo...@gmail.com wrote:

 I couldn't find any solution. I can write but I can't read from
 Cassandra.

 2015-06-09 8:52 GMT+03:00 Yasemin Kaya godo...@gmail.com:

 Thanks alot Mohammed, Gerard and Yana.
 I can write to table, but exception returns me. It says *Exception in
 thread main java.io.IOException: Failed to open thrift connection to
 Cassandra at 127.0.0.1:9160 http://127.0.0.1:9160*

 In yaml file :
 rpc_address: localhost
 rpc_port: 9160

 And at project :

 .set(spark.cassandra.connection.host, 127.0.0.1)
 .set(spark.cassandra.connection.rpc.port, 9160);

 or

 .set(spark.cassandra.connection.host, localhost)
 .set(spark.cassandra.connection.rpc.port, 9160);

 whatever I write setting,  I get same exception. Any help ??


 2015-06-08 18:23 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 yes, whatever you put for listen_address in cassandra.yaml. Also, you
 should try to connect to your cassandra cluster via bin/cqlsh to make sure
 you have connectivity before you try to make a a connection via spark.

 On Mon, Jun 8, 2015 at 4:43 AM, Yasemin Kaya godo...@gmail.com
 wrote:

 Hi,
 I run my project on local. How can find ip address of my cassandra
 host ? From cassandra.yaml or ??

 yasemin

 2015-06-08 11:27 GMT+03:00 Gerard Maas gerard.m...@gmail.com:

 ? = ip address of your cassandra host

 On Mon, Jun 8, 2015 at 10:12 AM, Yasemin Kaya godo...@gmail.com
 wrote:

 Hi ,

 How can I find spark.cassandra.connection.host? And what should I
 change ? Should I change cassandra.yaml ?

 Error says me *Exception in thread main java.io.IOException:
 Failed to open native connection to Cassandra at {127.0.1.1}:9042*

 What should I add *SparkConf sparkConf = new
 SparkConf().setAppName(JavaApiDemo).set(**spark.driver.allowMultipleContexts,
 true).set(spark.cassandra.connection.host, ?);*

 Best
 yasemin

 2015-06-06 3:04 GMT+03:00 Mohammed Guller moham...@glassbeam.com:

  Check your spark.cassandra.connection.host setting. It should be
 pointing to one of your Cassandra nodes.



 Mohammed



 *From:* Yasemin Kaya [mailto:godo...@gmail.com]
 *Sent:* Friday, June 5, 2015 7:31 AM
 *To:* user@spark.apache.org
 *Subject:* Cassandra Submit



 Hi,



 I am using cassandraDB in my project. I had that error *Exception
 in thread main java.io.IOException: Failed to open native 
 connection to
 Cassandra at {127.0.1.1}:9042*



 I think I have to modify the submit line. What should I add or
 remove when I submit my project?



 Best,

 yasemin





 --

 hiç ender hiç




 --
 hiç ender hiç





 --
 hiç ender hiç





 --
 hiç ender hiç




 --
 hiç ender hiç





 --
 hiç ender hiç




-- 
hiç ender hiç


Re: Cassandra Submit

2015-06-09 Thread Yasemin Kaya
My jar files are:

cassandra-driver-core-2.1.5.jar
cassandra-thrift-2.1.3.jar
guava-18.jar
jsr166e-1.1.0.jar
spark-assembly-1.3.0.jar
spark-cassandra-connector_2.10-1.3.0-M1.jar
spark-cassandra-connector-java_2.10-1.3.0-M1.jar
spark-core_2.10-1.3.1.jar
spark-streaming_2.10-1.3.1.jar

And my code from datastax spark-cassandra-connector
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-demos/simple-demos/src/main/java/com/datastax/spark/connector/demo/JavaApiDemo.java
.

Thanx alot.
yasemin

2015-06-09 18:58 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 hm. Yeah, your port is good...have you seen this thread:
 http://stackoverflow.com/questions/27288380/fail-to-use-spark-cassandra-connector
 ? It seems that you might be running into version mis-match issues?

 What versions of Spark/Cassandra-connector are you trying to use?

 On Tue, Jun 9, 2015 at 10:18 AM, Yasemin Kaya godo...@gmail.com wrote:

 Sorry my answer I hit terminal lsof -i:9160: result is

 lsof -i:9160
 COMMAND  PIDUSER   FD   TYPE DEVICE SIZE/OFF NODE NAME
 java7597 inosens  101u  IPv4  85754  0t0  TCP localhost:9160
 (LISTEN)

 so 9160 port is available or not ?

 2015-06-09 17:16 GMT+03:00 Yasemin Kaya godo...@gmail.com:

 Yes my cassandra is listening on 9160 I think. Actually I know from yaml
 file. The file includes :

 rpc_address: localhost
 # port for Thrift to listen for clients on
 rpc_port: 9160

 I check the port nc -z localhost 9160; echo $? it returns me 0. I
 think it close, should I open this port ?

 2015-06-09 16:55 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 Is your cassandra installation actually listening on 9160?

 lsof -i :9160COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
 java29232 ykadiysk   69u  IPv4 42152497  0t0  TCP localhost:9160 
 (LISTEN)

 ​
 I am running an out-of-the box cassandra conf where

 rpc_address: localhost
 # port for Thrift to listen for clients on
 rpc_port: 9160



 On Tue, Jun 9, 2015 at 7:36 AM, Yasemin Kaya godo...@gmail.com wrote:

 I couldn't find any solution. I can write but I can't read from
 Cassandra.

 2015-06-09 8:52 GMT+03:00 Yasemin Kaya godo...@gmail.com:

 Thanks alot Mohammed, Gerard and Yana.
 I can write to table, but exception returns me. It says *Exception
 in thread main java.io.IOException: Failed to open thrift connection to
 Cassandra at 127.0.0.1:9160 http://127.0.0.1:9160*

 In yaml file :
 rpc_address: localhost
 rpc_port: 9160

 And at project :

 .set(spark.cassandra.connection.host, 127.0.0.1)
 .set(spark.cassandra.connection.rpc.port, 9160);

 or

 .set(spark.cassandra.connection.host, localhost)
 .set(spark.cassandra.connection.rpc.port, 9160);

 whatever I write setting,  I get same exception. Any help ??


 2015-06-08 18:23 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 yes, whatever you put for listen_address in cassandra.yaml. Also,
 you should try to connect to your cassandra cluster via bin/cqlsh to 
 make
 sure you have connectivity before you try to make a a connection via 
 spark.

 On Mon, Jun 8, 2015 at 4:43 AM, Yasemin Kaya godo...@gmail.com
 wrote:

 Hi,
 I run my project on local. How can find ip address of my cassandra
 host ? From cassandra.yaml or ??

 yasemin

 2015-06-08 11:27 GMT+03:00 Gerard Maas gerard.m...@gmail.com:

 ? = ip address of your cassandra host

 On Mon, Jun 8, 2015 at 10:12 AM, Yasemin Kaya godo...@gmail.com
 wrote:

 Hi ,

 How can I find spark.cassandra.connection.host? And what should I
 change ? Should I change cassandra.yaml ?

 Error says me *Exception in thread main java.io.IOException:
 Failed to open native connection to Cassandra at {127.0.1.1}:9042*

 What should I add *SparkConf sparkConf = new
 SparkConf().setAppName(JavaApiDemo).set(**spark.driver.allowMultipleContexts,
 true).set(spark.cassandra.connection.host, ?);*

 Best
 yasemin

 2015-06-06 3:04 GMT+03:00 Mohammed Guller moham...@glassbeam.com
 :

  Check your spark.cassandra.connection.host setting. It should
 be pointing to one of your Cassandra nodes.



 Mohammed



 *From:* Yasemin Kaya [mailto:godo...@gmail.com]
 *Sent:* Friday, June 5, 2015 7:31 AM
 *To:* user@spark.apache.org
 *Subject:* Cassandra Submit



 Hi,



 I am using cassandraDB in my project. I had that error *Exception
 in thread main java.io.IOException: Failed to open native 
 connection to
 Cassandra at {127.0.1.1}:9042*



 I think I have to modify the submit line. What should I add or
 remove when I submit my project?



 Best,

 yasemin





 --

 hiç ender hiç




 --
 hiç ender hiç





 --
 hiç ender hiç





 --
 hiç ender hiç




 --
 hiç ender hiç





 --
 hiç ender hiç




 --
 hiç ender hiç





-- 
hiç ender hiç


Re: Cassandra Submit

2015-06-09 Thread Yasemin Kaya
I removed core and streaming jar. And the exception still same.

I tried what you said then results:

~/cassandra/apache-cassandra-2.1.5$ bin/cassandra-cli -h localhost -p 9160
Connected to: Test Cluster on localhost/9160
Welcome to Cassandra CLI version 2.1.5

The CLI is deprecated and will be removed in Cassandra 3.0.  Consider
migrating to cqlsh.
CQL is fully backwards compatible with Thrift data; see
http://www.datastax.com/dev/blog/thrift-to-cql3

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown]

and

~/cassandra/apache-cassandra-2.1.5$ bin/cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.5 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh

Thank you for your kind responses ...


2015-06-09 20:59 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 Hm, jars look ok, although it's a bit of a mess -- you have spark-assembly
 1.3.0 but then core and streaming 1.3.1...It's generally a bad idea to mix
 versions. Spark-assembly bundless all spark packages, so either do them
 separately or use spark-assembly but don't mix like you've shown.

 As to the port issue -- what about this:

 $bin/cassandra-cli -h localhost -p 9160
 Connected to: Test Cluster on localhost/9160
 Welcome to Cassandra CLI version 2.1.5


 On Tue, Jun 9, 2015 at 1:29 PM, Yasemin Kaya godo...@gmail.com wrote:

 My jar files are:

 cassandra-driver-core-2.1.5.jar
 cassandra-thrift-2.1.3.jar
 guava-18.jar
 jsr166e-1.1.0.jar
 spark-assembly-1.3.0.jar
 spark-cassandra-connector_2.10-1.3.0-M1.jar
 spark-cassandra-connector-java_2.10-1.3.0-M1.jar
 spark-core_2.10-1.3.1.jar
 spark-streaming_2.10-1.3.1.jar

 And my code from datastax spark-cassandra-connector
 https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-demos/simple-demos/src/main/java/com/datastax/spark/connector/demo/JavaApiDemo.java
 .

 Thanx alot.
 yasemin

 2015-06-09 18:58 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 hm. Yeah, your port is good...have you seen this thread:
 http://stackoverflow.com/questions/27288380/fail-to-use-spark-cassandra-connector
 ? It seems that you might be running into version mis-match issues?

 What versions of Spark/Cassandra-connector are you trying to use?

 On Tue, Jun 9, 2015 at 10:18 AM, Yasemin Kaya godo...@gmail.com wrote:

 Sorry my answer I hit terminal lsof -i:9160: result is

 lsof -i:9160
 COMMAND  PIDUSER   FD   TYPE DEVICE SIZE/OFF NODE NAME
 java7597 inosens  101u  IPv4  85754  0t0  TCP localhost:9160
 (LISTEN)

 so 9160 port is available or not ?

 2015-06-09 17:16 GMT+03:00 Yasemin Kaya godo...@gmail.com:

 Yes my cassandra is listening on 9160 I think. Actually I know from
 yaml file. The file includes :

 rpc_address: localhost
 # port for Thrift to listen for clients on
 rpc_port: 9160

 I check the port nc -z localhost 9160; echo $? it returns me 0. I
 think it close, should I open this port ?

 2015-06-09 16:55 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 Is your cassandra installation actually listening on 9160?

 lsof -i :9160COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE 
 NAME
 java29232 ykadiysk   69u  IPv4 42152497  0t0  TCP localhost:9160 
 (LISTEN)

 ​
 I am running an out-of-the box cassandra conf where

 rpc_address: localhost
 # port for Thrift to listen for clients on
 rpc_port: 9160



 On Tue, Jun 9, 2015 at 7:36 AM, Yasemin Kaya godo...@gmail.com
 wrote:

 I couldn't find any solution. I can write but I can't read from
 Cassandra.

 2015-06-09 8:52 GMT+03:00 Yasemin Kaya godo...@gmail.com:

 Thanks alot Mohammed, Gerard and Yana.
 I can write to table, but exception returns me. It says *Exception
 in thread main java.io.IOException: Failed to open thrift connection 
 to
 Cassandra at 127.0.0.1:9160 http://127.0.0.1:9160*

 In yaml file :
 rpc_address: localhost
 rpc_port: 9160

 And at project :

 .set(spark.cassandra.connection.host, 127.0.0.1)
 .set(spark.cassandra.connection.rpc.port, 9160);

 or

 .set(spark.cassandra.connection.host, localhost)
 .set(spark.cassandra.connection.rpc.port, 9160);

 whatever I write setting,  I get same exception. Any help ??


 2015-06-08 18:23 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 yes, whatever you put for listen_address in cassandra.yaml. Also,
 you should try to connect to your cassandra cluster via bin/cqlsh to 
 make
 sure you have connectivity before you try to make a a connection via 
 spark.

 On Mon, Jun 8, 2015 at 4:43 AM, Yasemin Kaya godo...@gmail.com
 wrote:

 Hi,
 I run my project on local. How can find ip address of my
 cassandra host ? From cassandra.yaml or ??

 yasemin

 2015-06-08 11:27 GMT+03:00 Gerard Maas gerard.m...@gmail.com:

 ? = ip address of your cassandra host

 On Mon, Jun 8, 2015 at 10:12 AM, Yasemin Kaya godo...@gmail.com
  wrote:

 Hi ,

 How can I find spark.cassandra.connection.host? And what should
 I change ? Should I change cassandra.yaml ?

 Error says me *Exception

Re: Cassandra Submit

2015-06-08 Thread Yasemin Kaya
Thanks alot Mohammed, Gerard and Yana.
I can write to table, but exception returns me. It says *Exception in
thread main java.io.IOException: Failed to open thrift connection to
Cassandra at 127.0.0.1:9160 http://127.0.0.1:9160*

In yaml file :
rpc_address: localhost
rpc_port: 9160

And at project :

.set(spark.cassandra.connection.host, 127.0.0.1)
.set(spark.cassandra.connection.rpc.port, 9160);

or

.set(spark.cassandra.connection.host, localhost)
.set(spark.cassandra.connection.rpc.port, 9160);

whatever I write setting,  I get same exception. Any help ??


2015-06-08 18:23 GMT+03:00 Yana Kadiyska yana.kadiy...@gmail.com:

 yes, whatever you put for listen_address in cassandra.yaml. Also, you
 should try to connect to your cassandra cluster via bin/cqlsh to make sure
 you have connectivity before you try to make a a connection via spark.

 On Mon, Jun 8, 2015 at 4:43 AM, Yasemin Kaya godo...@gmail.com wrote:

 Hi,
 I run my project on local. How can find ip address of my cassandra host
 ? From cassandra.yaml or ??

 yasemin

 2015-06-08 11:27 GMT+03:00 Gerard Maas gerard.m...@gmail.com:

 ? = ip address of your cassandra host

 On Mon, Jun 8, 2015 at 10:12 AM, Yasemin Kaya godo...@gmail.com wrote:

 Hi ,

 How can I find spark.cassandra.connection.host? And what should I
 change ? Should I change cassandra.yaml ?

 Error says me *Exception in thread main java.io.IOException: Failed
 to open native connection to Cassandra at {127.0.1.1}:9042*

 What should I add *SparkConf sparkConf = new
 SparkConf().setAppName(JavaApiDemo).set(**spark.driver.allowMultipleContexts,
 true).set(spark.cassandra.connection.host, ?);*

 Best
 yasemin

 2015-06-06 3:04 GMT+03:00 Mohammed Guller moham...@glassbeam.com:

  Check your spark.cassandra.connection.host setting. It should be
 pointing to one of your Cassandra nodes.



 Mohammed



 *From:* Yasemin Kaya [mailto:godo...@gmail.com]
 *Sent:* Friday, June 5, 2015 7:31 AM
 *To:* user@spark.apache.org
 *Subject:* Cassandra Submit



 Hi,



 I am using cassandraDB in my project. I had that error *Exception in
 thread main java.io.IOException: Failed to open native connection to
 Cassandra at {127.0.1.1}:9042*



 I think I have to modify the submit line. What should I add or remove
 when I submit my project?



 Best,

 yasemin





 --

 hiç ender hiç




 --
 hiç ender hiç





 --
 hiç ender hiç





-- 
hiç ender hiç


Re: Cassandra Submit

2015-06-08 Thread Yasemin Kaya
Hi ,

How can I find spark.cassandra.connection.host? And what should I change ?
Should I change cassandra.yaml ?

Error says me *Exception in thread main java.io.IOException: Failed to
open native connection to Cassandra at {127.0.1.1}:9042*

What should I add *SparkConf sparkConf = new
SparkConf().setAppName(JavaApiDemo).set(**spark.driver.allowMultipleContexts,
true).set(spark.cassandra.connection.host, ?);*

Best
yasemin

2015-06-06 3:04 GMT+03:00 Mohammed Guller moham...@glassbeam.com:

  Check your spark.cassandra.connection.host setting. It should be
 pointing to one of your Cassandra nodes.



 Mohammed



 *From:* Yasemin Kaya [mailto:godo...@gmail.com]
 *Sent:* Friday, June 5, 2015 7:31 AM
 *To:* user@spark.apache.org
 *Subject:* Cassandra Submit



 Hi,



 I am using cassandraDB in my project. I had that error *Exception in
 thread main java.io.IOException: Failed to open native connection to
 Cassandra at {127.0.1.1}:9042*



 I think I have to modify the submit line. What should I add or remove when
 I submit my project?



 Best,

 yasemin





 --

 hiç ender hiç




-- 
hiç ender hiç


Re: Cassandra Submit

2015-06-08 Thread Yasemin Kaya
Hi,
I run my project on local. How can find ip address of my cassandra host ?
From cassandra.yaml or ??

yasemin

2015-06-08 11:27 GMT+03:00 Gerard Maas gerard.m...@gmail.com:

 ? = ip address of your cassandra host

 On Mon, Jun 8, 2015 at 10:12 AM, Yasemin Kaya godo...@gmail.com wrote:

 Hi ,

 How can I find spark.cassandra.connection.host? And what should I change
 ? Should I change cassandra.yaml ?

 Error says me *Exception in thread main java.io.IOException: Failed
 to open native connection to Cassandra at {127.0.1.1}:9042*

 What should I add *SparkConf sparkConf = new
 SparkConf().setAppName(JavaApiDemo).set(**spark.driver.allowMultipleContexts,
 true).set(spark.cassandra.connection.host, ?);*

 Best
 yasemin

 2015-06-06 3:04 GMT+03:00 Mohammed Guller moham...@glassbeam.com:

  Check your spark.cassandra.connection.host setting. It should be
 pointing to one of your Cassandra nodes.



 Mohammed



 *From:* Yasemin Kaya [mailto:godo...@gmail.com]
 *Sent:* Friday, June 5, 2015 7:31 AM
 *To:* user@spark.apache.org
 *Subject:* Cassandra Submit



 Hi,



 I am using cassandraDB in my project. I had that error *Exception in
 thread main java.io.IOException: Failed to open native connection to
 Cassandra at {127.0.1.1}:9042*



 I think I have to modify the submit line. What should I add or remove
 when I submit my project?



 Best,

 yasemin





 --

 hiç ender hiç




 --
 hiç ender hiç





-- 
hiç ender hiç


Cassandra Submit

2015-06-05 Thread Yasemin Kaya
Hi,

I am using cassandraDB in my project. I had that error *Exception in thread
main java.io.IOException: Failed to open native connection to Cassandra
at {127.0.1.1}:9042*

I think I have to modify the submit line. What should I add or remove when
I submit my project?

Best,
yasemin


-- 
hiç ender hiç


ALS Rating Object

2015-06-03 Thread Yasemin Kaya
Hi,

I want to use Spark's ALS in my project. I have the userid
like 30011397223227125563254 and Rating Object which is the Object of ALS
wants Integer as a userid so the id field does not fit into a 32 bit
Integer. How can I solve that ? Thanks.

Best,
yasemin
-- 
hiç ender hiç


Re: ALS Rating Object

2015-06-03 Thread Yasemin Kaya
Hi Joseph,

I think about converting IDS but there will be birthday problem. The
probability of a Hash Collision
http://preshing.com/20110504/hash-collision-probabilities/ is important
for me because of the user number. I don't know how can I modify ALS to use
Integer.

yasemin


2015-06-04 2:28 GMT+03:00 Joseph Bradley jos...@databricks.com:

 Hi Yasemin,

 If you can convert your user IDs to Integers in pre-processing (if you
 have  a couple billion users), that would work.  Otherwise...
 In Spark 1.3: You may need to modify ALS to use Long instead of Int.
 In Spark 1.4: spark.ml.recommendation.ALS (in the Pipeline API) exposes
 ALS.train as a DeveloperApi to allow users to use Long instead of Int.
 We're also thinking about better ways to permit Long IDs.

 Joseph

 On Wed, Jun 3, 2015 at 5:04 AM, Yasemin Kaya godo...@gmail.com wrote:

 Hi,

 I want to use Spark's ALS in my project. I have the userid
 like 30011397223227125563254 and Rating Object which is the Object of ALS
 wants Integer as a userid so the id field does not fit into a 32 bit
 Integer. How can I solve that ? Thanks.

 Best,
 yasemin
 --
 hiç ender hiç





-- 
hiç ender hiç


Cassanda example

2015-06-01 Thread Yasemin Kaya
Hi,

I want to write my RDD to Cassandra database and I took an example from
this site
http://www.datastax.com/dev/blog/accessing-cassandra-from-spark-in-java.
I add that to my project but I have errors. Here is my project in gist
https://gist.github.com/yaseminn/aba86dad9a3e6d6a03dc.

errors :

   - At line 40 (can not recognize Session)
   - At line 106 (flatmap is not applicaple)

Have a nice day
yasemin

-- 
hiç ender hiç


Collabrative Filtering

2015-05-26 Thread Yasemin Kaya
Hi,

In CF

String path = data/mllib/als/test.data;
JavaRDDString data = sc.textFile(path);
 JavaRDDRating ratings = data.map(new FunctionString, Rating() {
public Rating call(String s) {
String[] sarray = s.split(,);
return new Rating(Integer.parseInt(sarray[0]), Integer
.parseInt(sarray[1]), Double.parseDouble(sarray[2]));
}
});

implemented like that.

I want to use CF for my data set, but it is JavaPairRDDString,
ListInteger . How can I convert my dataset to JavaRDDRating. Thank
you..

Best,
yasemin


-- 
hiç ender hiç


map reduce ?

2015-05-21 Thread Yasemin Kaya
Hi,

I have JavaPairRDDString, ListInteger and as an example what I want to
get.


user_id

cat1

cat2

cat3

cat4

522

0

1

2

0

62

1

0

3

0

661

1

2

0

1


query : the users who have a number (except 0) in cat1 and cat3 column
answer: cat2 - 522,611  cat3-522,62 = user 522

How can I get this solution?
I think at first, I should have JavaRDDListString of user list who are
in that column.

Thank you

Best,
yasemin

-- 
hiç ender hiç


reduceByKey

2015-05-14 Thread Yasemin Kaya
Hi,

I have JavaPairRDDString, String and I want to implement reduceByKey
method.

My pairRDD :
*2553: 0,0,0,1,0,0,0,0*
46551: 0,1,0,0,0,0,0,0
266: 0,1,0,0,0,0,0,0
*2553: 0,0,0,0,0,1,0,0*

*225546: 0,0,0,0,0,1,0,0*
*225546: 0,0,0,0,0,1,0,0*

I want to get :
*2553: 0,0,0,1,0,1,0,0*
46551: 0,1,0,0,0,0,0,0
266: 0,1,0,0,0,0,0,0
*225546: 0,0,0,0,0,2,0,0*

Anyone can help me getting that?
Thank you.

Have a nice day.
yasemin

-- 
hiç ender hiç


Re: swap tuple

2015-05-14 Thread Yasemin Kaya
I solved my problem right this way.

JavaPairRDDString, String swappedPair = pair.mapToPair(
new PairFunctionTuple2String, String, String, String() {
@Override
public Tuple2String, String call(
Tuple2String, String item)
throws Exception {
return item.swap();
}

});


2015-05-14 20:42 GMT+03:00 Stephen Carman scar...@coldlight.com:

  Yea, I wouldn't try and modify the current since RDDs are suppose to be
 immutable, just create a new one...

  val newRdd = oldRdd.map(r = (r._2(), r._1()))

  or something of that nature...

  Steve
  --
 *From:* Evo Eftimov [evo.efti...@isecc.com]
 *Sent:* Thursday, May 14, 2015 1:24 PM
 *To:* 'Holden Karau'; 'Yasemin Kaya'
 *Cc:* user@spark.apache.org
 *Subject:* RE: swap tuple

   Where is the “Tuple”  supposed to be in String, String - you can
 refer to a “Tuple” if it was e.g. String, Tuple2String, String



 *From:* holden.ka...@gmail.com [mailto:holden.ka...@gmail.com] *On Behalf
 Of *Holden Karau
 *Sent:* Thursday, May 14, 2015 5:56 PM
 *To:* Yasemin Kaya
 *Cc:* user@spark.apache.org
 *Subject:* Re: swap tuple



 Can you paste your code? transformations return a new RDD rather than
 modifying an existing one, so if you were to swap the values of the tuple
 using a map you would get back a new RDD and then you would want to try and
 print this new RDD instead of the original one.

 On Thursday, May 14, 2015, Yasemin Kaya godo...@gmail.com wrote:

 Hi,



 I have *JavaPairRDDString, String *and I want to *swap tuple._1() to
 tuple._2()*. I use *tuple.swap() *but it can't be changed JavaPairRDD in
 real. When I print JavaPairRDD, the values are same.



 Anyone can help me for that?



 Thank you.

 Have nice day.



 yasemin



 --

 hiç ender hiç



 --

 Cell : 425-233-8271

 Twitter: https://twitter.com/holdenkarau

 Linked In: https://www.linkedin.com/in/holdenkarau


This e-mail is intended solely for the above-mentioned recipient and
 it may contain confidential or privileged information. If you have received
 it in error, please notify us immediately and delete the e-mail. You must
 not copy, distribute, disclose or take any action in reliance on it. In
 addition, the contents of an attachment to this e-mail may contain software
 viruses which could damage your own computer system. While ColdLight
 Solutions, LLC has taken every reasonable precaution to minimize this risk,
 we cannot accept liability for any damage which you sustain as a result of
 software viruses. You should perform your own virus checks before opening
 the attachment.




-- 
hiç ender hiç


swap tuple

2015-05-14 Thread Yasemin Kaya
Hi,

I have *JavaPairRDDString, String *and I want to *swap tuple._1() to
tuple._2()*. I use *tuple.swap() *but it can't be changed JavaPairRDD in
real. When I print JavaPairRDD, the values are same.

Anyone can help me for that?

Thank you.
Have nice day.

yasemin

-- 
hiç ender hiç


JavaPairRDD

2015-05-13 Thread Yasemin Kaya
Hi,

I want to get  *JavaPairRDDString, String *from the tuple part of
*JavaPairRDDString,
Tuple2String, String  .*

As an example: (
http://www.koctas.com.tr/reyon/el-aletleri/7,(0,1,0,0,0,0,0,0,46551))
in my *JavaPairRDDString,
Tuple2String, String *and I want to get
*( (46551), (0,1,0,0,0,0,0,0) )*

I try to split tuple._2() and create new JavaPairRDD but I can't.
How can I get that ?

Have a nice day
yasemin
-- 
hiç ender hiç


Re: JavaPairRDD

2015-05-13 Thread Yasemin Kaya
Thank you Tristan. It is totally what I am looking for :)


2015-05-14 5:05 GMT+03:00 Tristan Blakers tris...@blackfrog.org:

 You could use a map() operation, but the easiest way is probably to just
 call values() method on the JavaPairRDDA,B to get a JavaRDDB.

 See this link:

 https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html

 Tristan





 On 13 May 2015 at 23:12, Yasemin Kaya godo...@gmail.com wrote:

 Hi,

 I want to get  *JavaPairRDDString, String *from the tuple part of 
 *JavaPairRDDString,
 Tuple2String, String  .*

 As an example: (
 http://www.koctas.com.tr/reyon/el-aletleri/7,(0,1,0,0,0,0,0,0,46551)) in
 my *JavaPairRDDString, Tuple2String, String *and I want to get
 *( (46551), (0,1,0,0,0,0,0,0) )*

 I try to split tuple._2() and create new JavaPairRDD but I can't.
 How can I get that ?

 Have a nice day
 yasemin
 --
 hiç ender hiç





-- 
hiç ender hiç


Content based filtering

2015-05-12 Thread Yasemin Kaya
Hi, is Content based filtering available for Spark in Mllib? If it isn't ,
what can I use as an alternative? Thank you.

Have a nice day
yasemin

-- 
hiç ender hiç


Spark Mongodb connection

2015-05-04 Thread Yasemin Kaya
Hi!

I am new at Spark and I want to begin Spark with simple wordCount example
in Java. But I want to give my input from Mongodb database. I want to learn
how can I connect Mongodb database to my project. Any one can help for this
issue.

Have a nice day
yasemin

-- 
hiç ender hiç