date:20180302

[Structured Streaming][Parquet] How do specify partition and data when saving to Parquet

2018-03-02 Thread karthikjay

My DataFrame has the following schema root |-- data: struct (nullable = true) ||-- zoneId: string (nullable = true) ||-- deviceId: string (nullable = true) ||-- timeSinceLast: long (nullable = true) |-- date: date (nullable = true) How can I do a writeStream with Parquet format

Re: Question on Spark-kubernetes integration

2018-03-02 Thread Felix Cheung

For pyspark specifically IMO should be very high on the list to port back... As for roadmap - should be sharing more soon. From: lucas.g...@gmail.com Sent: Friday, March 2, 2018 9:41:46 PM To: user@spark.apache.org Cc: Felix Cheung Subject:

Re: Question on Spark-kubernetes integration

2018-03-02 Thread lucas.g...@gmail.com

Oh interesting, given that pyspark was working in spark on kub 2.2 I assumed it would be part of what got merged. Is there a roadmap in terms of when that may get merged up? Thanks! On 2 March 2018 at 21:32, Felix Cheung wrote: > That’s in the plan. We should be

Re: Question on Spark-kubernetes integration

2018-03-02 Thread Felix Cheung

That's in the plan. We should be sharing a bit more about the roadmap in future releases shortly. In the mean time this is in the official documentation on what is coming: https://spark.apache.org/docs/latest/running-on-kubernetes.html#future-work This supports started as a fork of the Apache

Re: [Beginner] How to save Kafka Dstream data to parquet ?

2018-03-02 Thread Tathagata Das

Structured Streaming's file sink solves these problems by writing a log/manifest of all the authoritative files written out (for any format). So if you run batch or interactive queries on the output directory with Spark, it will automatically read the manifest and only process files are that are

Re: Pyspark Error: Unable to read a hive table with transactional property set as 'True'

2018-03-02 Thread ayan guha

Hi Couple of questions: 1. It seems the error is due to number format: Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0003024_" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at

Re: [Beginner] How to save Kafka Dstream data to parquet ?

2018-03-02 Thread Sunil Parmar

Is there a way to get finer control over file writing in parquet file writer ? We've an streaming application using Apache Apex ( on path of migration to Spark ...story for a different thread). The existing streaming application read JSON from Kafka and writes Parquet to HDFS. We're trying to

Re: Can I get my custom spark strategy to run last?

2018-03-02 Thread Vadim Semenov

Something like this? sparkSession.experimental.extraStrategies = Seq(Strategy) val logicalPlan = df.logicalPlan val newPlan: LogicalPlan = Strategy(logicalPlan) Dataset.ofRows(sparkSession, newPlan) On Thu, Mar 1, 2018 at 8:20 PM, Keith Chapman wrote: > Hi, > > I'd

Pyspark Error: Unable to read a hive table with transactional property set as 'True'

2018-03-02 Thread Debabrata Ghosh

Hi All, Greetings ! I needed some help to read a Hive table via Pyspark for which the transactional property is set to 'True' (In other words ACID property is enabled). Following is the entire stacktrace and the description of the hive table. Would you please be able to help

Spark Streaming reading many topics with Avro

2018-03-02 Thread Guillermo Ortiz

Hello, I want to read with a single Spark Streaming process several topics. I'm using avro and the data to the different topics have a different schema.Ideally, If I would only have one topic I could implement a deserializer but, I don't know if it's possible with many different schemas. val

Question on Spark-kubernetes integration

2018-03-02 Thread Lalwani, Jayesh

Does the Resource scheduler support dynamic resource allocation? Are there any plans to add in the future? The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be

Re: K Means Clustering Explanation

2018-03-02 Thread Matt Hicks

Thanks Alessandro and Christoph. I appreciate the feedback, but I'm still having issues determining how to actually accomplish this with the API. Can anyone point me to an example in code showing how to accomplish this? On Fri, Mar 2, 2018 2:37 AM, Alessandro Solimando

Re: K Means Clustering Explanation

2018-03-02 Thread Alessandro Solimando

Hi Matt, similarly to what Christoph does, I first derive the cluster id for the elements of my original dataset, and then I use a classification algorithm (cluster ids being the classes here). For this method to be useful you need a "human-readable" model, tree-based models are generally a good

[Structured Streaming][Parquet] How do specify partition and data when saving to Parquet

Re: Question on Spark-kubernetes integration

Re: Question on Spark-kubernetes integration

Re: Question on Spark-kubernetes integration

Re: [Beginner] How to save Kafka Dstream data to parquet ?

Re: Pyspark Error: Unable to read a hive table with transactional property set as 'True'

Re: [Beginner] How to save Kafka Dstream data to parquet ?

Re: Can I get my custom spark strategy to run last?

Pyspark Error: Unable to read a hive table with transactional property set as 'True'

Spark Streaming reading many topics with Avro

Question on Spark-kubernetes integration

Re: K Means Clustering Explanation

Re: K Means Clustering Explanation

13 matches

Site Navigation

Mail list logo

Footer information