Re: ordered ingestion not guaranteed

2018-05-11 Thread Jörn Franke
What DB do you have? You have some options, such as 1) use a key value store (they can be accessed very efficiently) to see if there has been a newer key already processed - if yes then ignore value if no then insert into database 2) redesign the key to include the timestamp and find out the

ordered ingestion not guaranteed

2018-05-11 Thread ravidspark
Hi All, I am using Spark 2.2.0 & I have below use case: *Reading from Kafka using Spark Streaming and updating(not just inserting) the records into downstream database* I understand that the way Spark read messages from Kafka will not be in a order of timestamp as stored in Kafka partitions

Re: SPARK SQL: returns null for a column, while HIVE query returns data for the same column

2018-05-11 Thread ARAVIND ARUMUGHAM Sethurathnam
- this column was added in later partitions and not present in earlier ones. - - i assume partition pruning should just load from that particular partition i am specifying when using spark sql ? - (spark version 2.2) On Fri, May 11, 2018 at 2:24 PM, ARAVIND ARUMUGHAM

Re: Spark 2.3.0 Structured Streaming Kafka Timestamp

2018-05-11 Thread Michael Armbrust
Hmm yeah that does look wrong. Would be great if someone opened a PR to correct the docs :) On Thu, May 10, 2018 at 5:13 PM Yuta Morisawa wrote: > The problem is solved. > The actual schema of Kafka message is different from documentation. > > >

SPARK SQL: returns null for a column, while HIVE query returns data for the same column

2018-05-11 Thread ARAVIND ARUMUGHAM Sethurathnam
I have a hive table created on top of s3 DATA in parquet format and partitioned by one column named eventdate. 1) When using HIVE QUERY, it returns data for a column named "headertime" which is in the schema of BOTH the table and the file. select headertime from dbName.test_bug where

Oozie with spark 2.3 in Kubernetes

2018-05-11 Thread purna pradeep
Hello, Would like to know if anyone tried oozie with spark 2.3 actions on Kubernetes for scheduling spark jobs . Thanks, Purna

UDTF registration fails for hiveEnabled SQLContext

2018-05-11 Thread Mick Davies
Hi, If I try to register a UDTF using SQLContext ( with enableHiveSupport set) using the code: I get the following error: It works OK if I use deprecated HiveContext. Is there a way to register UDTF without using deprecated code? This is happening in some tests I am writing using but I