union of multiple twitter streams [spark-streaming-twitter_2.11]

2018-07-02 Thread Imran Rajjad
Hello, Has anybody tried to union two streams of Twitter Statues? I am instantiating two twitter streams through two different set of credentials and passing them through a union function, but the console does not show any activity neither there are any errors. --static function that returns

Re: unable to connect to connect to cluster 2.2.0

2017-12-06 Thread Imran Rajjad
; Richard > > > > > > *From: *Imran Rajjad <raj...@gmail.com> > *Date: *Wednesday, December 6, 2017 at 2:45 AM > *To: *"user @spark" <user@spark.apache.org> > *Subject: *unable to connect to connect to cluster 2.2.0 > > > > Hi, > > >

unable to connect to connect to cluster 2.2.0

2017-12-05 Thread Imran Rajjad
Hi, Recently upgraded from 2.1.1 to 2.2.0. My Streaming job seems to have broken. The submitted application is unable to connect to the cluster, when all is running. below is my stack trace Spark Master:spark://192.168.10.207:7077 Job Arguments: -appName orange_watch -directory

spark strucured csv file stream not detecting new files

2017-11-15 Thread Imran Rajjad
Greetings, I am running a unit test designed to stream a folder where I am manually copying csv files. The files do not always get picked up. They only get detected when the job starts with the files already in the folder. I even tried using the option of fileNameOnly newly included in 2.2.0.

spark-stream memory table global?

2017-11-10 Thread Imran Rajjad
Hi, Does the memory table in which spark-structured streaming results are sinked into, is available to other spark applications on the cluster? Is it by default global or will only be available to context where streaming is being done thanks Imran -- I.R

unable to run spark streaming example

2017-11-03 Thread Imran Rajjad
I am trying out the network word count example and my unit test is producing the blow console output with an exception Exception in thread "dispatcher-event-loop-5" java.lang.NoClassDefFoundError: scala/runtime/AbstractPartialFunction$mcVL$sp at java.lang.ClassLoader.defineClass1(Native Method)

Re: parition by multiple columns/keys

2017-10-23 Thread Imran Rajjad
strangely this is working only for very small dataset of rows.. for very large datasets apparently the partitioning is not working. is there a limit to the number of columns or rows when repartitioning according to multiple columns? regards, Imran On Wed, Oct 18, 2017 at 11:00 AM, Imran Rajjad

Re: jar file problem

2017-10-19 Thread Imran Rajjad
Simple way is to have a network volume mounted with same name to make things easy On Thu, 19 Oct 2017 at 8:24 PM Uğur Sopaoğlu wrote: > Hello, > > I have a very easy problem. How I run a spark job, I must copy jar file to > all worker nodes. Is there any way to do simple?.

Re: parition by multiple columns/keys

2017-10-18 Thread Imran Rajjad
-- [12,24,3,1] >>>> [12,22,4,1] >>>> [11,22,1,1] [11,22,2,1] >>>> [11,21,1,1] >>>> [13,22,4,1] >>>> On Wed, Oct 18, 2017 at 10:29 AM, ayan guha <guha.a...@gmail.com> wrote: > How or what yo

Re: No space left on device

2017-10-17 Thread Imran Rajjad
don't think so. check out the documentation for this method On Wed, Oct 18, 2017 at 10:11 AM, Mina Aslani <aslanim...@gmail.com> wrote: > I have not tried rdd.unpersist(), I thought using rdd = null is the same, > is it not? > > On Wed, Oct 18, 2017 at 1:07 AM, Imran Rajjad

parition by multiple columns/keys

2017-10-17 Thread Imran Rajjad
Hi, I have a set of rows that are a result of a groupBy(col1,col2,col3).count(). Is it possible to map rows belong to unique combination inside an iterator? e.g col1 col2 col3 a 1 a1 a 1 a2 b 2 b1 b 2 b2 how can I separate rows with col1 and col2 =

Re: No space left on device

2017-10-17 Thread Imran Rajjad
did you try calling rdd.unpersist() On Wed, Oct 18, 2017 at 10:04 AM, Mina Aslani wrote: > Hi, > > I get "No space left on device" error in my spark worker: > > Error writing stream to file /usr/spark-2.2.0/work/app-.../0/stderr > java.io.IOException: No space left on

task not serializable on simple operations

2017-10-16 Thread Imran Rajjad
Is there a way around to implement a separate Java class that implements serializable interface for even small petty arithmetic operations? below is code from simple decision tree example Double testMSE = predictionAndLabel.map(new Function, Double>() { @Override

Re: Apache Spark-Subtract two datasets

2017-10-12 Thread Imran Rajjad
if the datasets hold objects of different classes, then you will have to convert both of them to rdd and then rename the columns befrore you call rdd1.subtract(rdd2) On Thu, Oct 12, 2017 at 10:16 PM, Shashikant Kulkarni < shashikant.kulka...@gmail.com> wrote: > Hello, > > I have 2 datasets,

Re: best spark spatial lib?

2017-10-11 Thread Imran Rajjad
spatial > and logical operators combined? Maybe I am not understanding the issue > properly > > Ram > > On Tue, Oct 10, 2017 at 2:21 AM, Imran Rajjad <raj...@gmail.com> wrote: > >> I need to have a location column inside my Dataframe so that I can do >> spati

best spark spatial lib?

2017-10-10 Thread Imran Rajjad
I need to have a location column inside my Dataframe so that I can do spatial queries and geometry operations. Are there any third-party packages that perform this kind of operations. I have seen a few like Geospark and megalan but they don't support operations where spatial and logical operators

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread Imran Rajjad
Try tachyon.. its less fuss On Fri, 29 Sep 2017 at 8:32 PM lucas.g...@gmail.com wrote: > We use S3, there are caveats and issues with that but it can be made to > work. > > If interested let me know and I'll show you our workarounds. I wouldn't > do it naively though,

Re: graphframes on cluster

2017-09-22 Thread Imran Rajjad
but I don't have much dependencies apart from a few POJOs which have been included through context regards, Imran On Wed, Sep 20, 2017 at 9:00 PM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Could you include the code where it fails? > Generally the best way to use gf is to use

graphframes on cluster

2017-09-20 Thread Imran Rajjad
Trying to run graph frames on a spark cluster. Do I need to include the package in spark context settings? or the only the driver program is suppose to have the graphframe libraries in its class path? Currently the job is crashing when action function is invoked on graphframe classes. regards,

Re: graphframe out of memory

2017-09-08 Thread Imran Rajjad
alexander.com/blog/post/java/java-xmx-xms- > memory-heap-size-control > > On Thu, Sep 7, 2017 at 12:16 PM, Imran Rajjad <raj...@gmail.com> wrote: > >> I am getting Out of Memory error while running connectedComponents job on >> graph with around 12000 vertices and 1346

graphframe out of memory

2017-09-07 Thread Imran Rajjad
I am getting Out of Memory error while running connectedComponents job on graph with around 12000 vertices and 134600 edges. I am running spark in embedded mode in a standalone Java application and have tried to increase the memory but it seems that its not taking any effect sparkConf = new

unable to import graphframes

2017-08-29 Thread Imran Rajjad
Dear list, I am following the documentation of graphframe and have started the scala shell using following command D:\spark-2.1.0-bin-hadoop2.7\bin>spark-shell --master local[2] --packages graphframes:graphframes:0.5.0-spark2.1-s_2.10 Ivy Default Cache set to: C:\Users\user\.ivy2\cache The

Re: Oracle Table not resolved [Spark 2.1.1]

2017-08-28 Thread Imran Rajjad
the jdbc url is invalid, but strangely it should have thrown ORA- exception On Mon, Aug 28, 2017 at 4:55 PM, Naga G <gudurun...@gmail.com> wrote: > Not able to find the database name. > ora is the database in the below url ? > > Sent from Naga iPad > > > On Aug 28, 201

Re: Spark SQL vs HiveQL

2017-08-28 Thread Imran Rajjad
If reading directly from file then Spark SQL should be your choice On Mon, Aug 28, 2017 at 10:25 PM Michael Artz wrote: > Just to be clear, I'm referring to having Spark reading from a file, not > from a Hive table. And it will have tungsten engine off heap

Oracle Table not resolved [Spark 2.1.1]

2017-08-28 Thread Imran Rajjad
Hello, I am trying to retrieve an oracle table into Dataset using following code String url = "jdbc:oracle@localhost:1521:ora"; Dataset jdbcDF = spark.read() .format("jdbc") .option("driver", "oracle.jdbc.driver.OracleDriver") .option("url", url) .option("dbtable",

Thrift-Server JDBC ResultSet Cursor Reset or Previous

2017-08-16 Thread Imran Rajjad
Dear List, Are there any future plans to implement cursor reset or previous record functionality in Thrift Server`s JDBC driver? Are there any other alternatives? java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HiveBaseResultSet.previous(HiveBaseResultSet.java:643) regards

solr data source not working

2017-07-20 Thread Imran Rajjad
I am unable to register the Solr Cloud as data source in Spark 2.1.0. Following the documentation at https://github.com/lucidworks/spark-solr#import-jar-file-via-spark-shell, I have used the 3.0.0.beta3 version. The system path is displaying the added jar as

Slow responce on Solr Cloud with Spark

2017-07-19 Thread Imran Rajjad
Greetings, We are trying out Spark 2 + ThriftServer to join multiple collections from a Solr Cloud (6.4.x). I have followed this blog https://lucidworks.com/2015/08/20/solr-spark-sql-datasource/ I understand that initially spark populates the temporary table with 18633014 records and takes its