Please help me on the below error & give me different approach on the below
data manipulation.
Error:Unable to find encoder for type stored in a Dataset. Primitive types
(Int, String, etc) and Product types (case classes) are supported by
importing spark.implicits._ Support for serializing other
You mean I should start two spark streaming application and read topics
respectively?
Regard,
Junfeng Chen
On Tue, Apr 3, 2018 at 10:31 PM, naresh Goud
wrote:
> I don’t see any option other than staring two individual queries. It’s
> just a thought.
>
> Thank you,
unsubscribe
I'm doing a spark test with spark streaming, cassandra and kafka.
I have an action which has an DStream as input and save to Cassandra and
sometimes put some elements in Kafka.
I'm using https://github.com/holdenk/spark-testing-base and kafka y
cassandra in local.
My method looks like:
*def
Hi,
I am going through the presentation
https://databricks.com/session/hive-bucketing-in-apache-spark.
Do we need to bucket both the tables for this to work? And is it mandatory
that the number of buckets should be multiple of each other?
Also if I export a persistent table to S3 will this
It turns out that the weight was too large (with mean around 5000 and the
standard deviation around 8000) and caused overflow. After scaling down the
weight to, for example, numbers between 0 and 1, the code converged nicely.
Spark did not report the overflow issue. We actually found it out by
On 28. mars 2018 03:26, Dongjoon Hyun wrote:
> You may hit SPARK-23355 (convertMetastore should not ignore table properties).
>
> Since it's a known Spark issue for all Hive tables (Parquet/ORC), could you
> check that too?
>
> Bests,
> Dongjoon.
>
Hi,
I think you might be right, I can run
Hi,
the other thing that you may try doing is use the following in your SQL and
then based on regular expressions filter out records based on which
directory they came from. But I would be very interested to know the
details which I have asked for in my earlier email.
input_file_name()
Hi,
I think that what you are facing is documented in SPARK:
http://spark.apache.org/docs/latest/rdd-programming-guide.html#understanding-closures-
May I ask what are you trying to achieve here? From what I understand, you
have a list of JSON files which you want to read separately, as they
Whenever spark read the data from it will have it in executor memory until
and unless there is no room for new data read or processed. This is the
beauty of spark.
On Tue, Apr 3, 2018 at 12:42 AM snjv wrote:
> Hi,
>
> When we execute the same operation twice, spark
>From spark point of view it shouldn’t effect. it’s possible to extend
columns of new parquet files and it won’t affect Performance and not
required to change spark application code.
On Tue, Apr 3, 2018 at 9:14 AM Vitaliy Pisarev
wrote:
> This is not strictly a
This is not strictly a spark question but I'll give it a shot:
have an existing setup of parquet files that are being queried from impala
and from spark.
I intend to add some 30 relatively 'heavy' columns to the parquet. Each
column would store an array of structs. Each struct can have from 5 to
12 matches
Mail list logo