Re: Sharing RDDS across applications and users

vincent gromakowski Fri, 28 Oct 2016 03:02:01 -0700

Bad idea. No caching, cluster over consumption...
Have a look on instantiating a custom thriftserver on temp tables with
fair  scheduler to allow concurrent SQL requests. It's not a public API but
you can find some examples.


Le 28 oct. 2016 11:12 AM, "Mich Talebzadeh" <mich.talebza...@gmail.com> a
écrit :

> Hi,
>
> I think tempTable is private to the session that creates it. In Hive temp
> tables created by "CREATE TEMPORARY TABLE" are all private to the session.
> Spark is no different.
>
> The alternative may be everyone creates tempTable from the same DF?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 October 2016 at 10:03, Chanh Le <giaosu...@gmail.com> wrote:
>
>> Can you elaborate on how to implement "shared sparkcontext and fair
>> scheduling" option?
>>
>>
>> It just reuse 1 Spark Context by not letting it stop when the application
>> had done. Should check: livy, spark-jobserver
>> FAIR https://spark.apache.org/docs/1.2.0/job-scheduling.html just how
>> you scheduler your job in the pool but FAIR help you run job in parallel vs
>> FIFO (default) 1 job at the time.
>>
>>
>> My approach was to use  sparkSession.getOrCreate() method and register
>> temp table in one application. However, I was not able to access this
>> tempTable in another application.
>>
>>
>> Store metadata in Hive may help but I am not sure about this.
>> I use Spark Thrift Server create table on that then let Zeppelin query
>> from that.
>>
>> Regards,
>> Chanh
>>
>>
>>
>>
>>
>> On Oct 27, 2016, at 9:01 PM, Victor Shafran <victor.shaf...@equalum.io>
>> wrote:
>>
>> Hi Vincent,
>> Can you elaborate on how to implement "shared sparkcontext and fair
>> scheduling" option?
>>
>> My approach was to use  sparkSession.getOrCreate() method and register
>> temp table in one application. However, I was not able to access this
>> tempTable in another application.
>> You help is highly appreciated
>> Victor
>>
>> On Thu, Oct 27, 2016 at 4:31 PM, Gene Pang <gene.p...@gmail.com> wrote:
>>
>>> Hi Mich,
>>>
>>> Yes, Alluxio is commonly used to cache and share Spark RDDs and
>>> DataFrames among different applications and contexts. The data typically
>>> stays in memory, but with Alluxio's tiered storage, the "colder" data can
>>> be evicted out to other medium, like SSDs and HDDs. Here is a blog post
>>> discussing Spark RDDs and Alluxio: https://www.alluxio.c
>>> om/blog/effective-spark-rdds-with-alluxio
>>>
>>> Also, Alluxio also has the concept of an "Under filesystem", which can
>>> help you access your existing data across different storage systems. Here
>>> is more information about the unified namespace abilities:
>>> http://www.alluxio.org/docs/master/en/Unified-and
>>> -Transparent-Namespace.html
>>>
>>> Hope that helps,
>>> Gene
>>>
>>> On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Thanks Chanh,
>>>>
>>>> Can it share RDDs.
>>>>
>>>> Personally I have not used either Alluxio or Ignite.
>>>>
>>>>
>>>>    1. Are there major differences between these two
>>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>>    have any experience you can kindly share
>>>>
>>>> Regards
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 27 October 2016 at 11:29, Chanh Le <giaosu...@gmail.com> wrote:
>>>>
>>>>> Hi Mich,
>>>>> Alluxio is the good option to go.
>>>>>
>>>>> Regards,
>>>>> Chanh
>>>>>
>>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>
>>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>>> sure how easy it is going to be changing the result set with different
>>>>> users modifying say sql queries.
>>>>>
>>>>> There is also the idea of caching RDDs with something like Apache
>>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>>> applications?
>>>>>
>>>>> It looks feasible as RDDs are immutable and so are registered
>>>>> tempTables etc.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Victor Shafran
>>
>> VP R&D| Equalum
>>
>> Mobile: +972-523854883 | Email: victor.shaf...@equalum.io
>>
>>
>>
>

Re: Sharing RDDS across applications and users

Reply via email to