date:20190625

Challenges with Datasource V2 API

2019-06-25 Thread Sunita Arvind

Hello Spark Experts, I am having challenges using the DataSource V2 API. I created a mock The input partitions seem to be created correctly. The below output confirms that: 19/06/23 16:00:21 INFO root: createInputPartitions 19/06/23 16:00:21 INFO root: Create a partition for abc The

Problem with the ML ALS algorithm

2019-06-25 Thread Steve Pruitt

I get an inexplicable exception when trying to build an ALSModel with the implicit set to true. I can’t find any help online. Thanks in advance. My code is: ALS als = new ALS() .setMaxIter(5) .setRegParam(0.01) .setUserCol("customer")

Spark Structured Streaming Custom Sources confusion

2019-06-25 Thread Lars Francke

Hi, I'm a bit confused about the current state and the future plans of custom data sources in Structured Streaming. So for DStreams we could write a Receiver as documented. Can this be used with Structured Streaming? Then we had the DataSource API with DefaultSource et. al. which was (in my

Distinguishing between field missing and null in individual record?

2019-06-25 Thread Jeff Evans

Suppose we have the following JSON, which we parse into a DataFrame (using the mulitline option). [{ "id": 8541, "value": "8541 changed again value" },{ "id": 51109, "name": "newest bob", "value": "51109 changed again" }] Regardless of whether we explicitly define a schema, or allow it

Implementing Upsert logic Through Streaming

2019-06-25 Thread Sachit Murarka

Hi All, I will get records continously in text file form(Streaming). It will have timestamp as field also. Target is Oracle Database. My Goal is to maintain latest record for a key in Oracle. Could you please suggest how this can be implemented efficiently? Kind Regards, Sachit Murarka

Potential Problem : Dropping malformed tables from CSV (PySpark)

2019-06-25 Thread Conor Begley

Hey, Currently working with CSV data and I've come across an unusual case. In Databricks, if I use the following code: GA_pages = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true', comment = '#', mode = "DROPMALFORMED").load(ga_sessions_path)

Challenges with Datasource V2 API

Problem with the ML ALS algorithm

Spark Structured Streaming Custom Sources confusion

Distinguishing between field missing and null in individual record?

Implementing Upsert logic Through Streaming

Potential Problem : Dropping malformed tables from CSV (PySpark)

6 matches

Site Navigation

Mail list logo

Footer information