Re: Difference in R and Spark Output

2017-01-02 Thread Saroj C
Thanks Satya. I tried setting the initSteps as 25 and the maxIteration as 500, both in R and Spark. The results provided below were from that settings. Also, in Spark and R the center remains almost the same, but they are different from each other. Thanks & Regards Saroj From: Satya

Aw: Re: Re: Spark Streaming prediction

2017-01-02 Thread Daniela S
Dear Marco   No problem, thank you very much for your help! Yes, that is correct. I always know the minute values for the next e.g. 180 minutes (may vary between the different devices) and I want to predict the values for the next 24 hours (one value per minute). So as long as I know the values

Re: Re: Spark Streaming prediction

2017-01-02 Thread Marco Mistroni
Apologies, perhaps i misunderstood your usecase. My assumption was that you have 2-3 hours worth fo data and you want to know the values for the next 24 based on the values you already have, that is why i suggested the ML path. If that is not the case please ignore everything i said.. so, let's

Aw: Re: Spark Streaming prediction

2017-01-02 Thread Daniela S
Hi   Thank you very much for your answer!   My problem is that I know the values for the next 2-3 hours in advance but i do not know the values from hour 2 or 3 to hour 24. How is it possible to combine the known values with the predicted values as both are values in the future? And how can i

Re: Spark Streaming prediction

2017-01-02 Thread Marco Mistroni
Hi you might want to have a look at the Regression ML algorithm and integrate it in your SparkStreaming application, i m sure someone on the list has a similar use case shortly, you'd want to process all your events and feed it through a ML model which,based on your inputs will predict output

Re: Issue with SparkR setup on RStudio

2017-01-02 Thread Felix Cheung
Perhaps it is with spark.sql.warehouse.dir="E:/Exp/" That you have in the sparkConfig parameter. Unfortunately the exception stack is fairly far away from the actual error, but from the top of my head spark.sql.warehouse.dir and HADOOP_HOME are the two different pieces that is not set in the

Re: Broadcast destroy

2017-01-02 Thread Anastasios Zouzias
Hi Bryan, I think the ContextCleaner will take care of the broadcasted variables, see i.e., https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-service-contextcleaner.html If it is easy to spot when to cleanup the broadcast variables in your case, a "xBroadcasted.destroy()"

Re: What's the best practice to load data from RDMS to Spark

2017-01-02 Thread Jacek Laskowski
FYI option works with boolean literals directly. Jacek On 30 Dec 2016 9:32 p.m., "Palash Gupta" wrote: > Hi, > > If you want to load from csv, you can use below procedure. Of course you > need to define spark context first. (Given example to load all csv under

Spark Streaming prediction

2017-01-02 Thread Daniela S
Hi   I am trying to solve the following problem with Spark Streaming. I receive timestamped events from Kafka. Each event refers to a device and contains values for every minute of the next 2 to 3 hours. What I would like to do is to predict the minute values for the next 24 hours. So I would

Re: Issue with SparkR setup on RStudio

2017-01-02 Thread Md. Rezaul Karim
Hello Cheung, Happy New Year! No, I did not configure Hive on my machine. Even I have tried not setting the HADOOP_HOME but getting the same error. Regards, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of

Spark Converting dataframe to Rdd reduces partitions

2017-01-02 Thread manish jaiswal
Hi, I am getting issue while converting dataframe to Rdd, it reduces partitions. In our code, Dataframe was created as : DataFrame DF = hiveContext.sql("select * from table_instance"); When I convert my dataframe to rdd and try to get its number of partitions as RDD newRDD = Df.rdd();

RE: Broadcast destroy

2017-01-02 Thread bryan.jeffrey
All, Anyone have a thought? Thank you, Bryan Jeffrey From: bryan.jeff...@gmail.com Sent: Friday, December 30, 2016 1:20 PM To: user Subject: Broadcast destroy All, If we are updating broadcast variables do we need to manually destroy the replaced broadcast, or will they be automatically

Re: Difference in R and Spark Output

2017-01-02 Thread Satya Varaprasad Allumallu
Can you run Spark Kmeans algorithm multiple times and see if the centers remain stable? I am guessing it is related to random initialization of centers. On Mon, Jan 2, 2017 at 1:34 AM, Saroj C wrote: > Dear Felix, > Thanks. Please find the differences > > Cluster Spark - Size

RE: What is missing here to use sql in spark?

2017-01-02 Thread Mendelson, Assaf
sqlContext.sql("select distinct CARRIER from flight201601") defines a dataframe which is lazily evaluated. This means that it returns a dataframe (which is what you got). If you want to see the results do: sqlContext.sql("select distinct CARRIER from flight201601").show() or df =