You could also enable it with --conf spark.logLineage=true if you do not
want to change any code.
Regards,
Keith.
http://keith-chapman.com
On Fri, Jul 21, 2017 at 7:57 PM, Keith Chapman
wrote:
> Hi Ron,
>
> You can try using the toDebugString method on the RDD, this
Hi Ron,
You can try using the toDebugString method on the RDD, this will print the
RDD lineage.
Regards,
Keith.
http://keith-chapman.com
On Fri, Jul 21, 2017 at 11:24 AM, Ron Gonzalez wrote:
> Hi,
> Can someone point me to a test case or share sample code
Hi,
I have several Spark jobs including both batch job and Stream jobs to
process the system log and analyze them. We are using Kafka as the pipeline
to connect each jobs.
Once upgrade to Spark 2.1.0 + Spark Kafka Streaming 010, I found some of
the jobs(both batch or streaming) are thrown below
Hi,
I would like to call a method on JavaPairRDD from Scala and I am not sure how
to write a function for the "map". I am using a third-party library that uses
Spark for geospatial computations and it happens that it returns some results
through Java API. I'd welcome a hint how to write a
Hi,
This is first time I am trying structured streaming with Kafka. I have
simple code to read from Kafka and display it on the console. Message is in
JSON format. However, when I run my code nothin after below line gets
printed.
17/07/21 13:43:41 INFO AppInfoParser: Kafka commitId :
Hi,
This is first time I am trying structured streaming with Kafka. I have
simple code to read from Kafka and display it on the console. Message is in
JSON format. However, when I run my code nothin after below line gets
printed.
17/07/21 13:43:41 INFO AppInfoParser: Kafka commitId :
Hi Xiao,
I am trying JSON sample table provided by Oracle 12C. It is on the website -
https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6371
CREATE TABLE j_purchaseorder
(id RAW (16) NOT NULL,
date_loaded TIMESTAMP WITH TIME ZONE,
po_document CLOB
CONSTRAINT
Please unsuscribe
Thanks
--
*Cornelio Iñigo*
How about creating a partituon column and use it?
On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit wrote:
> Is it possible to have Spark Data Frame Writer write based on
> RangePartioning?
>
> For Ex -
>
> I have 10 distinct values for column_a, say 1 to 10.
>
> df.write
>
Hi,
I'm having problems in converting a decimal data from Spark SQL to Avro.
I'm try to write an Avro file with such schema for the decimal field from a
spark application:
{ "name":"num", "type": ["null", {"type": "bytes", "logicalType":
"decimal", "precision": 3, "scale": 1}], "doc":"durata" }
Hi, Can someone point me to a test case or share sample code that is able to
extract the RDD graph from a Spark job anywhere during its lifecycle? I
understand that Spark has UI that can show the graph of the execution so I'm
hoping that is using some API somewhere that I could use. I know
Please unsubscribe me.
Could you share the schema of your Oracle table and open a JIRA?
Thanks!
Xiao
2017-07-21 9:40 GMT-07:00 Cassa L :
> I am using 2.2.0. I resolved the problem by removing SELECT * and adding
> column names to the SELECT statement. That works. I'm wondering why SELECT
> * will
Is it possible to have Spark Data Frame Writer write based on RangePartioning?
For Ex -
I have 10 distinct values for column_a, say 1 to 10.
df.write
.partitionBy("column_a")
Above code by default will create 10 folders .. column_a=1,column_a=2
...column_a=10
I want to see if it is possible
I am using 2.2.0. I resolved the problem by removing SELECT * and adding
column names to the SELECT statement. That works. I'm wondering why SELECT
* will not work.
Regards,
Leena
On Fri, Jul 21, 2017 at 8:21 AM, Xiao Li wrote:
> Could you try 2.2? We fixed multiple
On Fri, Jul 21, 2017 at 5:00 AM, Gokula Krishnan D wrote:
> Is there anyway can we setup the scheduler mode in Spark Cluster level
> besides application (SC level).
That's called the cluster (or resource) manager. e.g., configure
separate queues in YARN with a maximum number
What is a good way to support non-homogenous input data? In structured streaming
Let me explain the use case that we are trying to solve. We are reading data
from 3 topics in Kafka. All the topics have data in Avro format, with each of
them having their own schema. Now, all the 3 Avro schemas
Could you try 2.2? We fixed multiple Oracle related issues in the latest
release.
Thanks
Xiao
On Wed, 19 Jul 2017 at 11:10 PM Cassa L wrote:
> Hi,
> I am trying to use Spark to read from Oracle (12.1) table using Spark 2.0.
> My table has JSON data. I am getting below
The scc includes the java driver. Which means you could just use java
driver functions. It also provides a serializable wrapper which has session
and prepared statement pooling. Something like
val cc = CassandraConnector(sc.getConf)
SomeFunctionWithAnIterator{
it: SomeIterator =>
Mark & Ayan, thanks for the inputs.
*Is there anyway can we setup the scheduler mode in Spark Cluster level
besides application (SC level).*
Currently in YARN is in FAIR mode and manually we ensure that Spark
Application also in FAIR mode however noticed that Applications are not
releasing the
20 matches
Mail list logo