JavaRDD with custom class?

2016-04-12 Thread Daniel Valdivia
Hi, I'm moving some code from Scala to Java and I just hit a wall where I'm trying to move an RDD with a custom data structure to java, but I'm not being able to do so: Scala Code: case class IncodentDoc(system_id: String, category: String, terms: Seq[String]) var incTup =

How to deal with same class mismatch?

2016-02-01 Thread Daniel Valdivia
Hi, I'm having a couple of issues. I'm experiencing a known issue on the spark-shell where I'm getting a type mismatch for the right class :82: error: type mismatch; found :

Set Hadoop User in Spark Shell

2016-01-14 Thread Daniel Valdivia
Hi, I'm trying to set the value of a hadoop parameter within spark-shell, and System.setProperty("HADOOP_USER_NAME", "hadoop") seem to not be doing the trick Does anything know how I can set the hadoop.job.ugi parameter from within spark-shell ? Cheers

Re: Put all elements of RDD into array

2016-01-11 Thread Daniel Valdivia
; as the assignment value to your val. >> >> In case you ever want to append values iteratively, search for how to use >> scala "ArrayBuffer"s. Also, keep in mind that RDDs have a foreach method, so >> no need to call collect followed by foreach. >> >> r

Monitor Job on Yarn

2016-01-04 Thread Daniel Valdivia
Hello everyone, happy new year, I submitted an app to yarn, however I'm unable to monitor it's progress on the driver node, not in :8080 or :4040 as documented, when submitting to the standalone mode I could monitor however seems liek its not the case right now. I submitted my app this way:

Re: Monitor Job on Yarn

2016-01-04 Thread Daniel Valdivia
ark.apache.org/docs/latest/running-on-yarn.html> > > Note spark.yarn.historyServer.address > FYI > > On Mon, Jan 4, 2016 at 2:49 PM, Daniel Valdivia <h...@danielvaldivia.com > <mailto:h...@danielvaldivia.com>> wrote: > Hello everyone, happy new year, >

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Daniel Valdivia
..@databricks.com> > wrote: > > > Hi Greg, > > It's actually intentional for standalone cluster mode to not upload jars. One > of the reasons why YARN takes at least 10 seconds before running any simple > application is because there's a lot of random overhead (e.g.

Can't submit job to stand alone cluster

2015-12-28 Thread Daniel Valdivia
Hi, I'm trying to submit a job to a small spark cluster running in stand alone mode, however it seems like the jar file I'm submitting to the cluster is "not found" by the workers nodes. I might have understood wrong, but I though the Driver node would send this jar file to the worker nodes,

Re: Missing dependencies when submitting scala app

2015-12-23 Thread Daniel Valdivia
son, > do you also specify org.json4s.jackson in your sbt dependency but with a > different version ? > > On Wed, Dec 23, 2015 at 6:15 AM, Daniel Valdivia <h...@danielvaldivia.com> > wrote: > >> Hi, >> >> I'm trying to figure out how to bundle depen

Missing dependencies when submitting scala app

2015-12-22 Thread Daniel Valdivia
Hi, I'm trying to figure out how to bundle dependendies with a scala application, so far my code was tested successfully on the spark-shell however now that I'm trying to run it as a stand alone application which I'm compilin with sbt is yielding me the error: java.lang.NoSuchMethodError:

Scala VS Java VS Python

2015-12-16 Thread Daniel Valdivia
Hello, This is more of a "survey" question for the community, you can reply to me directly so we don't flood the mailing list. I'm having a hard time learning Spark using Python since the API seems to be slightly incomplete, so I'm looking at my options to start doing all my apps in either

Re: PairRDD(K, L) to multiple files by key serializing each value in L before

2015-12-16 Thread Daniel Valdivia
rough the values of this key,value pair > >for ele in line[1]: > > 4. Write every ele into the file created. > 5. Close the file. > > Do you think this works? > > Thanks > Abhishek S > > > Thank you! > > With Regards, > Abhishek S > > On Wed, De

Access row column by field name

2015-12-16 Thread Daniel Valdivia
Hi, I'm processing the json I have in a text file using DataFrames, however right now I'm trying to figure out a way to access a certain value within the rows of my data frame if I only know the field name and not the respective field position in the schema. I noticed that row.schema and

PairRDD(K, L) to multiple files by key serializing each value in L before

2015-12-15 Thread Daniel Valdivia
Hello everyone, I have a PairRDD with a set of key and list of values, each value in the list is a json which I already loaded beginning of my spark app, how can I iterate over each value of the list in my pair RDD to transform it to a string then save the whole content of the key to a file?