date:20191211

Re: How more than one spark job can write to same partition in the parquet file

2019-12-11 Thread ayan guha

We partitioned data logically for 2 different jobs...in our use case based
on geography...

On Thu, 12 Dec 2019 at 3:39 pm, Chetan Khatri 
wrote:

> Thanks, If you can share alternative change in design. I would love to
> hear from you.
>
> On Wed, Dec 11, 2019 at 9:34 PM ayan guha  wrote:
>
>> No we faced problem with that setup.
>>
>> On Thu, 12 Dec 2019 at 11:14 am, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hi Spark Users,
>>> would that be possible to write to same partition to the parquet file
>>> through concurrent two spark jobs with different spark session.
>>>
>>> thanks
>>>
>> --
>> Best Regards,
>> Ayan Guha
>>
> --
Best Regards,
Ayan Guha

Re: How more than one spark job can write to same partition in the parquet file

2019-12-11 Thread Chetan Khatri

Thanks, If you can share alternative change in design. I would love to hear
from you.

On Wed, Dec 11, 2019 at 9:34 PM ayan guha  wrote:

> No we faced problem with that setup.
>
> On Thu, 12 Dec 2019 at 11:14 am, Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
>
>> Hi Spark Users,
>> would that be possible to write to same partition to the parquet file
>> through concurrent two spark jobs with different spark session.
>>
>> thanks
>>
> --
> Best Regards,
> Ayan Guha
>

2019-12-11 Thread Genieliu





| |
Genieliu
|
|
feixiang...@163.com
China
|
签名由网易邮箱大师定制

Re: How more than one spark job can write to same partition in the parquet file

2019-12-11 Thread ayan guha

No we faced problem with that setup.

On Thu, 12 Dec 2019 at 11:14 am, Chetan Khatri 
wrote:

> Hi Spark Users,
> would that be possible to write to same partition to the parquet file
> through concurrent two spark jobs with different spark session.
>
> thanks
>
-- 
Best Regards,
Ayan Guha

Re: spark-shell, how it works internally

2019-12-11 Thread mykidong

I have found a source how to compile spark codes and dynamically load them
into distributed executors in spark repl:
https://ardoris.wordpress.com/2014/03/30/how-spark-does-class-loading/

If you run spark repl, you can find the spark configuration like this :
"spark.repl.class.uri":"spark://xxx:41827/classes"

The repl class fetch server will be run to handle the classes compiled by
repl spark interpreter with this uri in the spark repl driver.
The distributed executors will fetch the classes from the repl class fetch
server with the uri of "spark.repl.class.uri" and load them into the
classloader in ExecutorClassLoader.

I have also researched the spark and zeppeline source codes to use only
spark interpreter, but not repl entirely.

I have picked up some codes from zeppeline and spark to run spark
interpreter in my application. 
In my application, the embeded http server  will be run to handle and
interpret the spark codes from the user, the spark codes sent by users will
be interpreted dynamically and executed on the distributed executors like
spark repl does. It works for now!!

For my application, there are some more research to do, for instance, how to
handle multiple users with the individual spark session, etc.

Cheers,

- Kidong Lee.






--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How more than one spark job can write to same partition in the parquet file

2019-12-11 Thread Chetan Khatri

Hi Spark Users,
would that be possible to write to same partition to the parquet file
through concurrent two spark jobs with different spark session.

thanks

Hi,

I know this is a basic question but someone enquired about it and I just
wanted to fill my knowledge gap so to speak.

Within the context of Spark streaming, the RDD is created from the incoming
topic and RDD is partitioned and each node of Spark is operating on a
partition at that time. OK This series of operations are merged together
and create a DAG. That means DAG keeps track of operations performed?

If a node goes down, the driver (application master) knows about it. Then,
it tries to assign another node to continue the processing at the same
place on that partition of RDD. This works but with reduced performance.
However, to be able to handle the lost partition,* the data has to be
available to all nodes from the beginning.* So we are talking about spark
streaming not spark reading from HDFS, Hive table etc. The assumption is
that the streaming data is cached in every single node? Is that correct!

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: How more than one spark job can write to same partition in the parquet file

Re: How more than one spark job can write to same partition in the parquet file

subscribe

Re: How more than one spark job can write to same partition in the parquet file

Re: spark-shell, how it works internally

How more than one spark job can write to same partition in the parquet file

Unsubscribe

Unsubscribe

Spark streaming when a node or nodes go down

9 matches

Site Navigation

Mail list logo

Footer information