Re: Get full RDD lineage for a spark job

2017-07-21 Thread Keith Chapman
You could also enable it with --conf spark.logLineage=true if you do not want to change any code. Regards, Keith. http://keith-chapman.com On Fri, Jul 21, 2017 at 7:57 PM, Keith Chapman wrote: > Hi Ron, > > You can try using the toDebugString method on the RDD, this

Re: Get full RDD lineage for a spark job

2017-07-21 Thread Keith Chapman
Hi Ron, You can try using the toDebugString method on the RDD, this will print the RDD lineage. Regards, Keith. http://keith-chapman.com On Fri, Jul 21, 2017 at 11:24 AM, Ron Gonzalez wrote: > Hi, > Can someone point me to a test case or share sample code

Spark Job crash due to File Not found when shuffle intermittently

2017-07-21 Thread Martin Peng
Hi, I have several Spark jobs including both batch job and Stream jobs to process the system log and analyze them. We are using Kafka as the pipeline to connect each jobs. Once upgrade to Spark 2.1.0 + Spark Kafka Streaming 010, I found some of the jobs(both batch or streaming) are thrown below

[Spark] Working with JavaPairRDD from Scala

2017-07-21 Thread Lukasz Tracewski
Hi, I would like to call a method on JavaPairRDD from Scala and I am not sure how to write a function for the "map". I am using a third-party library that uses Spark for geospatial computations and it happens that it returns some results through Java API. I'd welcome a hint how to write a

Fwd: Spark Structured Streaming - Spark Consumer does not display messages

2017-07-21 Thread Cassa L
Hi, This is first time I am trying structured streaming with Kafka. I have simple code to read from Kafka and display it on the console. Message is in JSON format. However, when I run my code nothin after below line gets printed. 17/07/21 13:43:41 INFO AppInfoParser: Kafka commitId :

Spark Structured Streaming - Spark Consumer does not display messages

2017-07-21 Thread Cassa L
Hi, This is first time I am trying structured streaming with Kafka. I have simple code to read from Kafka and display it on the console. Message is in JSON format. However, when I run my code nothin after below line gets printed. 17/07/21 13:43:41 INFO AppInfoParser: Kafka commitId :

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Cassa L
Hi Xiao, I am trying JSON sample table provided by Oracle 12C. It is on the website - https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6371 CREATE TABLE j_purchaseorder (id RAW (16) NOT NULL, date_loaded TIMESTAMP WITH TIME ZONE, po_document CLOB CONSTRAINT

unsuscribe

2017-07-21 Thread Cornelio Iñigo
Please unsuscribe Thanks -- *Cornelio Iñigo*

Re: Spark Data Frame Writer - Range Partiotioning

2017-07-21 Thread ayan guha
How about creating a partituon column and use it? On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit wrote: > Is it possible to have Spark Data Frame Writer write based on > RangePartioning? > > For Ex - > > I have 10 distinct values for column_a, say 1 to 10. > > df.write >

[ Spark SQL ] Conversion from Spark SQL to Avro decimal

2017-07-21 Thread Ernesto Valentino
Hi, I'm having problems in converting a decimal data from Spark SQL to Avro. I'm try to write an Avro file with such schema for the decimal field from a spark application: { "name":"num", "type": ["null", {"type": "bytes", "logicalType": "decimal", "precision": 3, "scale": 1}], "doc":"durata" }

Get full RDD lineage for a spark job

2017-07-21 Thread Ron Gonzalez
Hi,  Can someone point me to a test case or share sample code that is able to extract the RDD graph from a Spark job anywhere during its lifecycle? I understand that Spark has UI that can show the graph of the execution so I'm hoping that is using some API somewhere that I could use.  I know

Unsubscribe

2017-07-21 Thread Siddhartha Khaitan
Please unsubscribe me.

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Xiao Li
Could you share the schema of your Oracle table and open a JIRA? Thanks! Xiao 2017-07-21 9:40 GMT-07:00 Cassa L : > I am using 2.2.0. I resolved the problem by removing SELECT * and adding > column names to the SELECT statement. That works. I'm wondering why SELECT > * will

Spark Data Frame Writer - Range Partiotioning

2017-07-21 Thread Jain, Nishit
Is it possible to have Spark Data Frame Writer write based on RangePartioning? For Ex - I have 10 distinct values for column_a, say 1 to 10. df.write .partitionBy("column_a") Above code by default will create 10 folders .. column_a=1,column_a=2 ...column_a=10 I want to see if it is possible

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Cassa L
I am using 2.2.0. I resolved the problem by removing SELECT * and adding column names to the SELECT statement. That works. I'm wondering why SELECT * will not work. Regards, Leena On Fri, Jul 21, 2017 at 8:21 AM, Xiao Li wrote: > Could you try 2.2? We fixed multiple

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-21 Thread Marcelo Vanzin
On Fri, Jul 21, 2017 at 5:00 AM, Gokula Krishnan D wrote: > Is there anyway can we setup the scheduler mode in Spark Cluster level > besides application (SC level). That's called the cluster (or resource) manager. e.g., configure separate queues in YARN with a maximum number

Supporting columns with heterogenous data

2017-07-21 Thread Lalwani, Jayesh
What is a good way to support non-homogenous input data? In structured streaming Let me explain the use case that we are trying to solve. We are reading data from 3 topics in Kafka. All the topics have data in Avro format, with each of them having their own schema. Now, all the 3 Avro schemas

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Xiao Li
Could you try 2.2? We fixed multiple Oracle related issues in the latest release. Thanks Xiao On Wed, 19 Jul 2017 at 11:10 PM Cassa L wrote: > Hi, > I am trying to use Spark to read from Oracle (12.1) table using Spark 2.0. > My table has JSON data. I am getting below

Re: Spark (SQL / Structured Streaming) Cassandra - PreparedStatement

2017-07-21 Thread Russell Spitzer
The scc includes the java driver. Which means you could just use java driver functions. It also provides a serializable wrapper which has session and prepared statement pooling. Something like val cc = CassandraConnector(sc.getConf) SomeFunctionWithAnIterator{ it: SomeIterator =>

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-21 Thread Gokula Krishnan D
Mark & Ayan, thanks for the inputs. *Is there anyway can we setup the scheduler mode in Spark Cluster level besides application (SC level).* Currently in YARN is in FAIR mode and manually we ensure that Spark Application also in FAIR mode however noticed that Applications are not releasing the