Hi Andrey,
Below is the description of MEMORY_ONLY from
https://spark.apache.org/docs/latest/rdd-programming-guide.html
"Store RDD as deserialized Java objects in the JVM. If the RDD does not fit
in memory, some partitions will not be cached and will be recomputed on the
fly each time they're
|-- name: string (nullable = true)
> ||-- age: integer (nullable = false)
>
>
> On Wed, Feb 27, 2019 at 6:51 PM Hien Luu wrote:
>
>> Thanks for looking into this. Does this mean string fields should alway
>> be nullable?
>>
>> You are right that the re
ll"]},{"name":"age","type":"int"}]}
>
> scala> dfKV.select(from_avro('value, avroTypeStruct)).show
> +-+
> |from_avro(value, struct)|
> +-----+
> | [Mary Jane, 25]|
Hi,
I ran into a pretty weird issue with to_avro and from_avro where it was not
able to parse the data in a struct correctly. Please see the simple and
self contained example below. I am using Spark 2.4. I am not sure if I
missed something.
This is how I start the spark-shell on my Mac:
Hi Farshid,
Take a look at this example on github -
https://github.com/hienluu/structured-streaming-sources.
Cheers,
Hien
On Thu, Jul 12, 2018 at 12:52 AM Farshid Zavareh
wrote:
> Hello.
>
> I need to create a custom streaming source by extending *FileStreamSource*.
> The idea is to override
Finally got a toy version of Structured Streaming DataSource V2 version with
Apache Spark 2.3 working. Tested locally and on Databricks community
edition.
Source code is here - https://github.com/hienluu/wikiedit-streaming
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
I finally got around to implement a custom structured streaming receiver
(source) to read Wikipedia edit events from the IRC server.
It works fines locally as well as in spark-shell on my laptop. However, it
failed with the following exception when running in Databricks community
edition.
It
Hi Kant,
I am not sure whether you had come up with a solution yet, but the following
works for me (in Scala)
val emp_info = """
[
{"name": "foo", "address": {"state": "CA", "country": "USA"},
"docs":[{"subject": "english", "year": 2016}]},
{"name": "bar", "address": {"state": "OH",
Cool. Thanks nezhazheng. I will give it a shot.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi TD,
I looked at DataStreamReader class and looks like we can specify an FQCN as
a source (provided that it implements trait Source). The
DataSource.lookupDataSource function will try to load this FQCN during the
creation of a DataSource object instance inside the DataStreamReader.load().
Will
r local repo?
>
> Best, RS
> On May 2, 2016 8:51 PM, "Hien Luu" <hien...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am running into a build problem with com.oracle:ojdbc6:jar:11.2.0.1.0.
>> It kept getting "Operation timed out" while building Spark P
Hi all,
I am running into a build problem with com.oracle:ojdbc6:jar:11.2.0.1.0.
It kept getting "Operation timed out" while building Spark Project Docker
Integration Tests module (see the error below).
Has anyone run this problem before? If so, how did you resolve around this
problem?
[INFO]
n
> from you. See StateDStream.scala, there is everything you need to know.
>
> On Fri, Nov 6, 2015 at 6:25 PM Hien Luu <h...@linkedin.com> wrote:
>
>> Hi,
>>
>> I am interested in learning about the implementation of
>> updateStateByKey. Does anyone kn
Hi,
I am interested in learning about the implementation of updateStateByKey.
Does anyone know of a jira or design doc I read?
I did a quick search and couldn't find much info. on the implementation.
Thanks in advance,
Hien
>>
>> On Fri, Aug 7, 2015 at 11:25 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> In my opinion, choosing some particular project among its peers should
>>> leave enough room for future growth (which may come faster than you
>>> initially think).
>>
initially think).
Cheers
On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu h...@linkedin.com wrote:
Scalability is a known issue due the the current architecture. However
this will be applicable if you run more 20K jobs per day.
On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu yuzhih...@gmail.com wrote
This blog outlines a few things that make Spark faster than MapReduce -
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
On Fri, Aug 7, 2015 at 9:13 AM, Muler mulugeta.abe...@gmail.com wrote:
Consider the classic word count application over a 4 node cluster with a
sizable
Looks like Oozie can satisfy most of your requirements.
On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone vikramk...@gmail.com wrote:
Hi,
I'm looking for open source workflow tools/engines that allow us to
schedule spark jobs on a datastax cassandra cluster. Since there are tonnes
of
18 matches
Mail list logo