Re: Spark Job Server application compilation issue

2018-03-14 Thread sujeet jog
o include the errors you get once you're going to be asking them a > question > > On Wed, Mar 14, 2018 at 1:37 PM, sujeet jog <sujeet@gmail.com> wrote: > >> >> Input is a json request, which would be decoded in myJob() & processed >> further. >> >&

Spark Job Server application compilation issue

2018-03-14 Thread sujeet jog
Input is a json request, which would be decoded in myJob() & processed further. Not sure what is wrong with below code, it emits errors as unimplemented methods (runJob/validate), any pointers on this would be helpful, jobserver-0.8.0 object MyJobServer extends SparkSessionJob { type JobData

running Spark-JobServer in eclipse

2018-03-03 Thread sujeet jog
Is there a way to run Spark-JobServer in eclipse ?.. any pointers in this regard ?

read parallel processing spark-cassandra

2018-02-13 Thread sujeet jog
Folks, I have a time series table with each record being 350 columns. the primary key is ((date, bucket), objectid, timestamp) objective is to read 1 day worth of data, which comes to around 12k partitions, each partition has around 25MB of data, I see only 1 task active during the read

Spark Docker

2017-12-25 Thread sujeet jog
Folks, Can you share your experience of running spark under docker on a single local / standalone node. Anybody using it under production environments ?, we have a existing Docker Swarm deployment, and i want to run Spark in a seperate FAT VM hooked / controlled by docker swarm I know there is

running dockerized spark applications in DC/OS

2017-08-31 Thread sujeet jog
Folks, Does any body have production experience in running dockerized spark application on DC/OS, and can the spark cluster run other than spark stand alone mode ?.. What are the major differences between running spark with Mesos Cluster manager Vs running Spark as dockerized container under

Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog
Correction. On Tue, Jun 20, 2017 at 5:27 PM, sujeet jog <sujeet@gmail.com> wrote: > , Below is the query, looks like from physical plan, the query is same as > that of cqlsh, > > val query = s"""(select * from model_data > where TimeStamp

Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog
at 5:13 PM, Riccardo Ferrari <ferra...@gmail.com> wrote: > Hi, > > Personally I would inspect how dates are managed. How does your spark code > looks like? What does the explain say. Does TimeStamp gets parsed the same > way? > > Best, > > On Tue, Jun 20, 2017 at

Cassandra querying time stamps

2017-06-20 Thread sujeet jog
Hello, I have a table as below CREATE TABLE analytics_db.ml_forecast_tbl ( "MetricID" int, "TimeStamp" timestamp, "ResourceID" timeuuid "Value" double, PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID") ) select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >

Re: JSON Arrays and Spark

2016-10-12 Thread sujeet jog
I generally use Play Framework Api's for comple json structures. https://www.playframework.com/documentation/2.5.x/ScalaJson#Json On Wed, Oct 12, 2016 at 11:34 AM, Kappaganthu, Sivaram (ES) < sivaram.kappagan...@adp.com> wrote: > Hi, > > > > Does this mean that handling any Json with kind of

Convert RDD to JSON Rdd and append more information

2016-09-20 Thread sujeet jog
Hi, I have a Rdd of n rows, i want to transform this to a Json RDD, and also add some more information , any idea how to accomplish this . ex : - i have rdd with n rows with data like below , , 16.9527493170273,20.1989561393151,15.7065424947394

Partition n keys into exacly n partitions

2016-09-12 Thread sujeet jog
Hi, Is there a way to partition set of data with n keys into exactly n partitions. For ex : - tuple of 1008 rows with key as x tuple of 1008 rows with key as y and so on total 10 keys ( x, y etc ) Total records = 10080 NumOfKeys = 10 i want to partition the 10080 elements into exactly 10

Re: iterating over DataFrame Partitions sequentially

2016-09-10 Thread sujeet jog
On Fri, Sep 9, 2016 at 11:45 AM, Jakob Odersky <ja...@odersky.com> wrote: > > Hi Sujeet, > > > > going sequentially over all parallel, distributed data seems like a > > counter-productive thing to do. What are you trying to accomplish? > > > > regards, >

iterating over DataFrame Partitions sequentially

2016-09-09 Thread sujeet jog
Hi, Is there a way to iterate over a DataFrame with n partitions sequentially, Thanks, Sujeet

Re: Dataframe write to DB , loosing primary key index & data types.

2016-08-24 Thread sujeet jog
There was a inherent bug in my code which did this, On Wed, Aug 24, 2016 at 8:07 PM, sujeet jog <sujeet@gmail.com> wrote: > Hi, > > I have a table with definition as below , when i write any records to this > table, the varchar(20 ) gets changes to text, and it also losses

Dataframe write to DB , loosing primary key index & data types.

2016-08-24 Thread sujeet jog
Hi, I have a table with definition as below , when i write any records to this table, the varchar(20 ) gets changes to text, and it also losses the primary key index, any idea how to write data with spark SQL without loosing the primary key index & data types. ? MariaDB [analytics]> show

Re: call a mysql stored procedure from spark

2016-08-15 Thread sujeet jog
;> wrote: >> >>> As described here >>> <http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases>, >>> you can use the DataSource API to connect to an external database using >>> JDBC. While the dbtable option is usually just a table name, it can >>> also be any valid SQL command that returns a table when enclosed in >>> (parentheses). I'm not certain, but I'd expect you could use this feature >>> to invoke a stored procedure and return the results as a DataFrame. >>> >>> On Sat, Aug 13, 2016 at 10:40 AM, sujeet jog <sujeet@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Is there a way to call a stored procedure using spark ? >>>> >>>> >>>> thanks, >>>> Sujeet >>>> >>> >>> >>

call a mysql stored procedure from spark

2016-08-13 Thread sujeet jog
Hi, Is there a way to call a stored procedure using spark ? thanks, Sujeet

Re: update specifc rows to DB using sqlContext

2016-08-11 Thread sujeet jog
y and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or dest

Re: update specifc rows to DB using sqlContext

2016-08-11 Thread sujeet jog
t is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 9 August 2016 at 13:39, sujeet jog <sujeet@gmail.com> wrote: > >> Hi, >> >> Is it possible to update ce

update specifc rows to DB using sqlContext

2016-08-09 Thread sujeet jog
Hi, Is it possible to update certain columnr records in DB from spark, for example i have 10 rows with 3 columns which are read from Spark SQL, i want to update specific column entries and write back to DB, but since RDD"s are immutable i believe this would be difficult, is there a

Re: how to run local[k] threads on a single core

2016-08-05 Thread sujeet jog
Spark does not support thread to CPU >> affinity. >> > On Aug 4, 2016, at 14:27, sujeet jog <sujeet@gmail.com> wrote: >> > >> > Is there a way we can run multiple tasks concurrently on a single core >> in local mode. >> > >>

how to run local[k] threads on a single core

2016-08-04 Thread sujeet jog
Is there a way we can run multiple tasks concurrently on a single core in local mode. for ex :- i have 5 partition ~ 5 tasks, and only a single core , i want these tasks to run concurrently, and specifiy them to use /run on a single core. The machine itself is say 4 core, but i want to utilize

Re: Load selected rows with sqlContext in the dataframe

2016-07-22 Thread sujeet jog
Thanks Todd. On Thu, Jul 21, 2016 at 9:18 PM, Todd Nist <tsind...@gmail.com> wrote: > You can set the dbtable to this: > > .option("dbtable", "(select * from master_schema where 'TID' = '100_0')") > > HTH, > > Todd > > > On Thu, Jul 21, 2

Load selected rows with sqlContext in the dataframe

2016-07-21 Thread sujeet jog
I have a table of size 5GB, and want to load selective rows into dataframe instead of loading the entire table in memory, For me memory is a constraint hence , and i would like to peridically load few set of rows and perform dataframe operations on it, , for the "dbtable" is there a way to

Re: Using R code as part of a Spark Application

2016-06-30 Thread sujeet jog
nality coming in Spark 2.0, such as >>>> > "dapply". You could use SparkR to load a Parquet file and then run >>>> "dapply" >>>> > to apply a function to each partition of a DataFrame. >>>> > >>>> > Info

Re: Using R code as part of a Spark Application

2016-06-29 Thread sujeet jog
try Spark pipeRDD's , you can invoke the R script from pipe , push the stuff you want to do on the Rscript stdin, p On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau wrote: > Hello, > > > > I want to use R code as part of spark application (the same way I would do >

Re: Spark jobs

2016-06-29 Thread sujeet jog
check if this helps, from multiprocessing import Process def training() : print ("Training Workflow") cmd = spark/bin/spark-submit ./ml.py & " os.system(cmd) w_training = Process(target = training) On Wed, Jun 29, 2016 at 6:28 PM, Joaquin Alzola

Re: Can we use existing R model in Spark

2016-05-30 Thread sujeet jog
Try to invoke a R script from Spark using rdd pipe method , get the work done & and receive the model back in RDD. for ex :- . rdd.pipe("") On Mon, May 30, 2016 at 3:57 PM, Sun Rui wrote: > Unfortunately no. Spark does not support loading external modes (for >

Re: local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog
s clear cut answer to NOT to use local mode in prod. >> Others may have different opinions on this. >> >> HTH >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6

Re: local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog
itor or the logs created >>> >>> HTH >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>

Re: local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog
om/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 28 May 2016 at 18:03, sujeet jog <sujeet@gmail.com> wrote: > >> Thanks Ted, >> >> Thanks Mich, yes i see that i can run two

Re: local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog
Web GUI >>> on 4040 to see the progress of this Job. If you start the next JVM then >>> assuming it is working, it will be using port 4041 and so forth. >>> >>> >>> In actual fact try the command "free" to see how much free memory you >>

local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog
Hi, I have a question w.r.t production deployment mode of spark, I have 3 applications which i would like to run independently on a single machine, i need to run the drivers in the same machine. The amount of resources i have is also limited, like 4- 5GB RAM , 3 - 4 cores. For deployment in

sparkApp on standalone/local mode with multithreading

2016-05-25 Thread sujeet jog
I had few questions w.r.t to Spark deployment & and way i want to use, It would be helpful if you can answer few. I plan to use Spark on a embedded switch, which has limited set of resources, like say 1 or 2 dedicated cores and 1.5GB of memory, want to model a network traffic with time series

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread sujeet jog
It depends on the trade off's you wish to have, Python being a interpreted language, speed of execution will be lesser, but it being a very common language used across, people can jump in hands on quickly Scala programs run in java environment, so it's obvious you will get good execution speed,

Re: Aggregate subsequenty x row values together.

2016-03-28 Thread sujeet jog
; Can you describe your use case a bit more ? > > Since the row keys are not sorted in your example, there is a chance that > you get indeterministic results when you aggregate on groups of two > successive rows. > > Thanks > > On Mon, Mar 28, 2016 at 9:21 AM, sujeet jog <

Aggregate subsequenty x row values together.

2016-03-28 Thread sujeet jog
Hi, I have a RDD like this . [ 12, 45 ] [ 14, 50 ] [ 10, 35 ] [ 11, 50 ] i want to aggreate values of first two rows into 1 row and subsequenty the next two rows into another single row... i don't have a key to aggregate for using some of the aggregate pyspark functions, how to achieve it ?

Run External R script from Spark

2016-03-21 Thread sujeet jog
Hi, I have been working on a POC on some time series related stuff, i'm using python since i need spark streaming and sparkR is yet to have a spark streaming front end, couple of algorithms i want to use are not yet present in Spark-TS package, so I'm thinking of invoking a external R script for