Re: Kafka createDirectStream ​issue

2015-09-19 Thread kali.tumm...@gmail.com
Hi , I am trying to develop in intellij Idea same code I am having the same issue is there any work around. Error in intellij:- cannot resolve symbol createDirectStream import kafka.serializer.StringDecoder import org.apache.spark._ import org.apache.spark.SparkContext._ import

Re: Unable to see my kafka spark streaming output

2015-09-19 Thread kali.tumm...@gmail.com
Hi All, figured it out for got mention local as loca[2] , at least two node required. package com.examples /** * Created by kalit_000 on 19/09/2015. */ import org.apache.spark._ import org.apache.spark.SparkContext._ import org.apache.spark.sql.SQLContext import org.apache.spark.SparkConf

Re: question building spark in a virtual machine

2015-09-19 Thread Eyal Altshuler
I allocated almost 6GB of RAM to the ubuntu virtual machine and got the same problem. I will go over this post and try to zoom in into the java vm settings. meanwhile - can someone with a working ubuntu machine can specify her JVM settings? Thanks, Eyal On Sat, Sep 19, 2015 at 7:49 PM, Ted Yu

Re: in joins, does one side stream?

2015-09-19 Thread Rishitesh Mishra
Hi Reynold, Can you please elaborate on this. I thought RDD also opens only an iterator. Does it get materialized for joins? Rishi On Saturday, September 19, 2015, Reynold Xin wrote: > Yes for RDD -- both are materialized. No for DataFrame/SQL - one side > streams. > > >

Re: PrunedFilteredScan does not work for UDTs and Struct fields

2015-09-19 Thread Zhan Zhang
Hi Richard, I am not sure how to support user-defined type. But regarding your second question, you can have a walkaround as following. Suppose you have a struct a, and want to filter a.c with a.c > X. You can define a alias C as a.c, and add extra column C to the schema of the relation,

Unable to see my kafka spark streaming output

2015-09-19 Thread kali.tumm...@gmail.com
Hi All, I am unable to see the output getting printed in the console can anyone help. package com.examples /** * Created by kalit_000 on 19/09/2015. */ import org.apache.spark._ import org.apache.spark.SparkContext._ import org.apache.spark.sql.SQLContext import org.apache.spark.SparkConf

PrunedFilteredScan does not work for UDTs and Struct fields

2015-09-19 Thread Richard Eggert
I defined my own relation (extending BaseRelation) and implemented the PrunedFilteredScan interface, but discovered that if the column referenced in a WHERE = clause is a user-defined type or a field of a struct column, then Spark SQL passes NO filters to the PrunedFilteredScan.buildScan method,

Re: in joins, does one side stream?

2015-09-19 Thread Reynold Xin
The RDDs themselves are not materialized, but the implementations can materialize. E.g. in cogroup (which is used by RDD.join), it materializes all the data during grouping. In SQL/DataFrame join, depending on the join: 1. For broadcast join, only the smaller side is materialized in memory as a

DataGenerator for streaming application

2015-09-19 Thread Saiph Kappa
Hi, I am trying to build a data generator that feeds a streaming application. This data generator just reads a file and send its lines through a socket. I get no errors on the logs, and the benchmark bellow always prints "Received 0 records". Am I doing something wrong? object MyDataGenerator {

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

2015-09-19 Thread Timothy Chen
You can still provide properties through the docker container by putting configuration in the conf directory, but we try to pass all properties submitted from the driver spark-submit through which I believe will override the defaults. This is not what you are seeing? Tim > On Sep 19, 2015,

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

2015-09-19 Thread Tim Chen
I guess I need a bit more clarification, what kind of assumptions was the dispatcher making? Tim On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite wrote: > Hi Tim, > > Thanks for the follow up. It's not so much that I expect the executor to > inherit the configuration

Re: Zeppelin on Yarn : org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't running on a cluster. Deployment to YARN is not supported directly by SparkContext. Please use spark-submi

2015-09-19 Thread Ewan Leith
yarn-client still runs the executor tasks on the cluster, the main difference is where the driver job runs. Thanks, Ewan -- Original message-- From: shahab Date: Fri, 18 Sep 2015 13:11 To: Aniket Bhatnagar; Cc: user@spark.apache.org; Subject:Re: Zeppelin on Yarn :

Docker/Mesos with Spark

2015-09-19 Thread John Omernik
I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and just found you CAN run it this way. Are there any user posts, blog posts, etc on why and how you'd do this? Basically, at first I was questioning why you'd run spark in a docker container, i.e., if you run with tar balled

Re: Using Spark for portfolio manager app

2015-09-19 Thread Jörn Franke
If you want to be able to let your users query their portfolio then you may want to think about storing the current state of the portfolios in hbase/phoenix or alternatively a cluster of relationaldatabases can make sense. For the rest you may use Spark. Le sam. 19 sept. 2015 à 4:43, Thúy Hằng Lê

question building spark in a virtual machine

2015-09-19 Thread Eyal Altshuler
Hi, Trying to build spark in my ubuntu virtual machine, I am getting the following error: "Error occurred during initialization of VM Could not reserve enough space for object heap Error: could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit". I have

word count (group by users) in spark

2015-09-19 Thread kali.tumm...@gmail.com
Hi All, I would like to achieve this below output using spark , I managed to write in Hive and call it in spark but not in just spark (scala), how to group word counts on particular user (column) for example. Imagine users and their given tweets I want to do word count based on user name.

Re: question building spark in a virtual machine

2015-09-19 Thread Ted Yu
Can you tell us how you configured the JVM heap size ? Which version of Java are you using ? When I build Spark, I do the following: export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" Cheers On Sat, Sep 19, 2015 at 5:31 AM, Eyal Altshuler

Re: question building spark in a virtual machine

2015-09-19 Thread Eyal Altshuler
Hi, I had configured the MAVEN_OPTS environment variable the same as you wrote. My java version is 1.7.0_75. I didn't customized the JVM heap size specifically. Is there an additional configuration I have to run besides the MAVEN_OPTS configutaion? Thanks, Eyal On Sat, Sep 19, 2015 at 5:29 PM,

Re: question building spark in a virtual machine

2015-09-19 Thread Aniket Bhatnagar
Hi Eval Can you check if your Ubuntu VM has enough RAM allocated to run JVM of size 3gb? thanks, Aniket On Sat, Sep 19, 2015, 9:09 PM Eyal Altshuler wrote: > Hi, > > I had configured the MAVEN_OPTS environment variable the same as you wrote. > My java version is

Re: question building spark in a virtual machine

2015-09-19 Thread Ted Yu
See also this thread: https://bukkit.org/threads/complex-craftbukkit-server-and-java-problem-could-not-reserve-enough-space-for-object-heap.155192/ Cheers On Sat, Sep 19, 2015 at 8:51 AM, Aniket Bhatnagar < aniket.bhatna...@gmail.com> wrote: > Hi Eval > > Can you check if your Ubuntu VM has

Re: word count (group by users) in spark

2015-09-19 Thread Aniket Bhatnagar
Using scala API, you can first group by user and then use combineByKey. Thanks, Aniket On Sat, Sep 19, 2015, 6:41 PM kali.tumm...@gmail.com wrote: > Hi All, > I would like to achieve this below output using spark , I managed to write > in Hive and call it in spark but

Re: question building spark in a virtual machine

2015-09-19 Thread Eyal Altshuler
Hi, I allocate 4GB for the ubuntu virtual machine, how to check what is the maximal available for a jvm process? Regarding the thread - I see it's related to building on windows. Thanks, Eyal On Sat, Sep 19, 2015 at 6:54 PM, Ted Yu wrote: > See also this thread: > >