Re: Joining streaming data with static table data.

2017-12-11 Thread Vikash Pareek
Hi Satyajit, For the query/join part there is a couple of approaches. 1. create a dataframe from all incoming streaming batch (i.e. actually an rdd) and join with your reference data (coming from existing table) 2. you can use structure streaming that basically consists of the schema in every

Re: Joining streaming data with static table data.

2017-12-11 Thread Vikash Pareek
Hi Satyajit, For the query/join part there is a couple of approaches. 1. create a dataframe from all incoming streaming batch (i.e. actually an rdd) and join with your reference data (coming from existing table) 2. you can use structure streaming that basically consists of schema in every batch

Re: Joining streaming data with static table data.

2017-12-11 Thread Rishi Mishra
You can do a join between streaming dataset and a static dataset. I would prefer your first approach. But the problem with this approach is performance. Unless you cache the dataset , every time you fire a join query it will fetch the latest records from the table. Regards, Rishitesh Mishra,

pyspark.sql.utils.AnalysisException: u'Left outer/semi/anti joins with a streaming DataFrame/Dataset on the right is not supported;

2017-12-11 Thread salemi
Hi All, I am having trouble joining two structured streaming DataFrames. I am getting the following error: pyspark.sql.utils.AnalysisException: u'Left outer/semi/anti joins with a streaming DataFrame/Dataset on the right is not supported; Is there other way to join two streaming DataFrames

How Fault Tolerance is achieved in Spark ??

2017-12-11 Thread Nikhil.R.Patil
Hello Techie's, How fault tolerance is achieved in Spark when data is read from HDFS and is in form of RDD (Memory). Regards Nikhil "Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s). are confidential and may be privileged. If

Json to csv

2017-12-11 Thread Prabha K
Any help on converting json to csv or flattering the json file. Json file has one struts and multiple arrays. Thanks Pk Sent from my iPhone - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Joining streaming data with static table data.

2017-12-11 Thread satyajit vegesna
Hi All, I working on real time reporting project and i have a question about structured streaming job, that is going to stream a particular table records and would have to join to an existing table. Stream > query/join to another DF/DS ---> update the Stream data record. Now i have a

Writing a UDF that works with an Interval in PySpark

2017-12-11 Thread Daniel Haviv
Hi, I'm trying to write a variant of date_add that accepts an interval as a second parameter so that I could use the following syntax with SparkSQL: select date_add(cast('1970-01-01' as date), interval 1 day) but I'm getting the following error: ValueError: (ValueError(u'Could not parse datatype:

Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread Burak Yavuz
In Spark 2.2, you can read from Kafka in batch mode, and then use the json reader to infer schema: val df = spark.read.format("kafka")... .select($"value.cast("string")) val json = spark.read.json(df) val schema = json.schema While the above should be slow (since you're reading almost all data

Re: Loading a spark dataframe column into T-Digest using java

2017-12-11 Thread Marcelo Vanzin
The closure in your "foreach" loop runs in a remote executor, no the local JVM, so it's updating its own copy of the t-digest instance. The one on the driver side is never touched. On Sun, Dec 10, 2017 at 10:27 PM, Himasha de Silva wrote: > Hi, > > I want to load a spark

Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread satyajit vegesna
Hi Burak, Thank you , for the inputs, would definitely try the options. The reason we don't have an unified schema is because we are trying to consume data from different topics that contains data from different tables from a DB, and so each table has different columns. Regards, Satyajit. On

unsubscribe

2017-12-11 Thread Malcolm Croucher

Spark Structured Streaming how to read data from AWS SQS

2017-12-11 Thread Bogdan Cojocar
For spark streaming there are connectors that can achieve this functionality. Unfortunately for spark structured streaming I couldn't find any as it's a newer technology. Is there a way to connect to a source using a spark streaming connector? Or is

Re: ML Transformer: create feature that uses multiple columns

2017-12-11 Thread davideanastasia
Hi Filipp, your solution worked very well: thanks a lot! Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Why Spark 2.2.1 still bundles old Hive jars?

2017-12-11 Thread Jacek Laskowski
Hi, https://issues.apache.org/jira/browse/SPARK-19076 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at

Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread Jacek Laskowski
Hi, What about a custom streaming Sink that would stop the query after addBatch has been called? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark

Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread satyajit vegesna
Hi Jacek, For now , i am using Thread.sleep() on driver, to make sure my streaming query receives some data and and stop it, before the control reaches querying memory table. Let me know if there is any better way of handling it. Regards, Satyajit. On Sun, Dec 10, 2017 at 10:43 PM, satyajit