Re: hive on spark - version question

2017-03-17 Thread Stephen Sprague
yeah but... is the glass half-full or half-empty?  sure this might suck but
keep your head high, bro! Lots of it (hive) does work. :)


On Fri, Mar 17, 2017 at 2:25 PM, hernan saab 
wrote:

> Stephan,
>
> Thanks for the response.
>
> The one thing that I don't appreciate from those who promote and DOCUMENT
> spark on hive is that, seemingly, there is absolutely no evidence seen that
> says that hive on spark WORKS.
> As a matter of fact, after a lot of pain, I noticed it is not supported by
> just about anybody.
>
> If someone dares to document Hive on Spark (see link
> https://cwiki.apache.org/confluence/display/Hive/Hive+
> on+Spark%3A+Getting+Started)  why can't they have the decency to mention
> what specific combo of Hadoop/Spark/Hive versions used that works? Have a
> git repo included in a doc with all the right versions and libraries. Why
> not? We can start from there and progressively use newer libraries in case
> the doc becomes stale. I am not really asking much, I just want to know
> what the documenter used to claim that Hive on Spark works, that's it.
>
> Clearly, for most cases, this setup is broken and it misleads people to
> waste time on a broken setup.
>
> I love this tech. But I do notice that there is some mean spirited or very
> negligent actions made by the apache development community. Documenting
> hive on spark while knowing it won't work for most cases means apache
> developers don't give a crap about the time wasted by people like us.
>
>
>
>
> On Friday, March 17, 2017 1:14 PM, Edward Capriolo 
> wrote:
>
>
>
>
> On Fri, Mar 17, 2017 at 2:56 PM, hernan saab  > wrote:
>
> I have been in a similar world of pain. Basically, I tried to use an
> external Hive to have user access controls with a spark engine.
> At the end, I realized that it was a better idea to use apache tez instead
> of a spark engine for my particular case.
>
> But the journey is what I want to share with you.
> The big data apache tools and libraries such as Hive, Tez, Spark, Hadoop ,
> Parquet etc etc are not interchangeable as we would like to think. There
> are very limited combinations for very specific versions. This is why tools
> like Ambari can be useful. Ambari sets a path of combos of versions known
> to work and the dirty work is done under the UI.
>
> More often than not, when you try a version that few people tried, you
> will get error messages that will derailed you and cause you to waste a lot
> of time.
>
> In addition, this group, as well as many other apache big data user
> groups,  provides extremely poor support for users. The answers you usually
> get are not even hints to a solution. Their answers usually translate to
> "there is nothing I am willing to do about your problem. If I did, I should
> get paid" in many cryptic ways.
>
> If you ask your question to the Spark group they will take you to the Hive
> group and viceversa (I can almost guarantee it based on previous
> experiences)
>
> But in hindsight, people who work on this kinds of things typically make
> more money that the average developers. If you make more $$s it makes sense
> learning this stuff is supposed to be harder.
>
> Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive  if
> you are querying large files.
>
>
>
> On Friday, March 17, 2017 11:33 AM, Stephen Sprague 
> wrote:
>
>
> :(  gettin' no love on this one.   any SME's know if Spark 2.1.0 will work
> with Hive 2.1.0 ?  That JavaSparkListener class looks like a deal breaker
> to me, alas.
>
> thanks in advance.
>
> Cheers,
> Stephen.
>
> On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague 
> wrote:
>
> hi guys,
> wondering where we stand with Hive On Spark these days?
>
> i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental
> versions) and running up against this class not found:
>
> java.lang. NoClassDefFoundError: org/apache/spark/ JavaSparkListener
>
>
> searching the Cyber i find this:
> 1. http://stackoverflow.com/ questions/41953688/setting-
> spark-as-default-execution- engine-for-hive
> 
>
> which pretty much describes my situation too and it references this:
>
>
> 2. https://issues.apache.org/ jira/browse/SPARK-17563
> 
>
> which indicates a "won't fix" - but does reference this:
>
>
> 3. https://issues.apache.org/ jira/browse/HIVE-14029
> 
>
> which looks to be fixed in hive 2.2 - which is not released yet.
>
>
> so if i want to use spark 2.1.0 with hive am i out of luck - until hive
> 2.2?
>
> thanks,
> Stephen.
>
>
>
>
>
> Stephan,
>
> I understand some of your frustration.  Remember that many in open source
> are volunteering their time. This is why if you pay a vendor for support of

Re: hive on spark - version question

2017-03-17 Thread hernan saab
Stephan,
Thanks for the response.
The one thing that I don't appreciate from those who promote and DOCUMENT spark 
on hive is that, seemingly, there is absolutely no evidence seen that says that 
hive on spark WORKS. As a matter of fact, after a lot of pain, I noticed it is 
not supported by just about anybody.
If someone dares to document Hive on Spark (see link 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
  why can't they have the decency to mention what specific combo of 
Hadoop/Spark/Hive versions used that works? Have a git repo included in a doc 
with all the right versions and libraries. Why not? We can start from there and 
progressively use newer libraries in case the doc becomes stale. I am not 
really asking much, I just want to know what the documenter used to claim that 
Hive on Spark works, that's it.
Clearly, for most cases, this setup is broken and it misleads people to waste 
time on a broken setup.
I love this tech. But I do notice that there is some mean spirited or very 
negligent actions made by the apache development community. Documenting hive on 
spark while knowing it won't work for most cases means apache developers don't 
give a crap about the time wasted by people like us.

 

On Friday, March 17, 2017 1:14 PM, Edward Capriolo  
wrote:
 

 

On Fri, Mar 17, 2017 at 2:56 PM, hernan saab  
wrote:

I have been in a similar world of pain. Basically, I tried to use an external 
Hive to have user access controls with a spark engine.At the end, I realized 
that it was a better idea to use apache tez instead of a spark engine for my 
particular case.
But the journey is what I want to share with you.The big data apache tools and 
libraries such as Hive, Tez, Spark, Hadoop , Parquet etc etc are not 
interchangeable as we would like to think. There are very limited combinations 
for very specific versions. This is why tools like Ambari can be useful. Ambari 
sets a path of combos of versions known to work and the dirty work is done 
under the UI. 
More often than not, when you try a version that few people tried, you will get 
error messages that will derailed you and cause you to waste a lot of time.
In addition, this group, as well as many other apache big data user groups,  
provides extremely poor support for users. The answers you usually get are not 
even hints to a solution. Their answers usually translate to "there is nothing 
I am willing to do about your problem. If I did, I should get paid" in many 
cryptic ways.
If you ask your question to the Spark group they will take you to the Hive 
group and viceversa (I can almost guarantee it based on previous experiences)
But in hindsight, people who work on this kinds of things typically make more 
money that the average developers. If you make more $$s it makes sense learning 
this stuff is supposed to be harder.
Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive  if you 
are querying large files.
 

On Friday, March 17, 2017 11:33 AM, Stephen Sprague  
wrote:
 

 :(  gettin' no love on this one.   any SME's know if Spark 2.1.0 will work 
with Hive 2.1.0 ?  That JavaSparkListener class looks like a deal breaker to 
me, alas.

thanks in advance.

Cheers,
Stephen.

On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague  wrote:

hi guys,
wondering where we stand with Hive On Spark these days?

i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental versions) 
and running up against this class not found:

java.lang. NoClassDefFoundError: org/apache/spark/ JavaSparkListener


searching the Cyber i find this:
    1. http://stackoverflow.com/ questions/41953688/setting- 
spark-as-default-execution- engine-for-hive

    which pretty much describes my situation too and it references this:


    2. https://issues.apache.org/ jira/browse/SPARK-17563

    which indicates a "won't fix" - but does reference this:


    3. https://issues.apache.org/ jira/browse/HIVE-14029

    which looks to be fixed in hive 2.2 - which is not released yet.


so if i want to use spark 2.1.0 with hive am i out of luck - until hive 2.2?

thanks,
Stephen.





   

Stephan,  
I understand some of your frustration.  Remember that many in open source are 
volunteering their time. This is why if you pay a vendor for support of some 
software you might pay 50K a year or $200.00 an hour. If I was your 
vendor/consultant I would have started the clock 10 minutes ago just to answer 
this email :). The only "pay" I ever got from Hive is that I can use it as a 
resume bullet point, and I wrote a book which pays me royalties.
As it relates specifically to your problem, when you see the trends you are 
seeing it probably means you are in a minority of the user base. Either your 
doing something no one else is doing, you are too cutting edge, or no one has 
an easy solution. Hive is making the move from the classic 

Re: hive on spark - version question

2017-03-17 Thread Stephen Sprague
thanks for the comments and for sure all relevant. And yeah I feel the pain
just like the next guy but that's the part of the opensource "life style"
you subscribe to when using it.

The upside payoff has gotta be worth the downside risk - or else forget
about it right? Here in the Hive world in my experience anyway its been
great.  Gotta roll with it, be courteous, be persistent and sometimes
things just work out.

Getting back to Spark and Tez yes by all means i'm a big Tez user aleady so
i was hoping to see what Spark brought to table and i didn't want to diddle
around with Spark < 2.0.   That's cool. I can live with that not being
nailed down yet. I'll just wait for hive 2.2 and rattle the cage again! ha!


All good!

Cheers,
Stephen.

On Fri, Mar 17, 2017 at 1:14 PM, Edward Capriolo 
wrote:

>
>
> On Fri, Mar 17, 2017 at 2:56 PM, hernan saab  > wrote:
>
>> I have been in a similar world of pain. Basically, I tried to use an
>> external Hive to have user access controls with a spark engine.
>> At the end, I realized that it was a better idea to use apache tez
>> instead of a spark engine for my particular case.
>>
>> But the journey is what I want to share with you.
>> The big data apache tools and libraries such as Hive, Tez, Spark, Hadoop
>> , Parquet etc etc are not interchangeable as we would like to think. There
>> are very limited combinations for very specific versions. This is why tools
>> like Ambari can be useful. Ambari sets a path of combos of versions known
>> to work and the dirty work is done under the UI.
>>
>> More often than not, when you try a version that few people tried, you
>> will get error messages that will derailed you and cause you to waste a lot
>> of time.
>>
>> In addition, this group, as well as many other apache big data user
>> groups,  provides extremely poor support for users. The answers you usually
>> get are not even hints to a solution. Their answers usually translate to
>> "there is nothing I am willing to do about your problem. If I did, I should
>> get paid" in many cryptic ways.
>>
>> If you ask your question to the Spark group they will take you to the
>> Hive group and viceversa (I can almost guarantee it based on previous
>> experiences)
>>
>> But in hindsight, people who work on this kinds of things typically make
>> more money that the average developers. If you make more $$s it makes sense
>> learning this stuff is supposed to be harder.
>>
>> Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive  if
>> you are querying large files.
>>
>>
>>
>> On Friday, March 17, 2017 11:33 AM, Stephen Sprague 
>> wrote:
>>
>>
>> :(  gettin' no love on this one.   any SME's know if Spark 2.1.0 will
>> work with Hive 2.1.0 ?  That JavaSparkListener class looks like a deal
>> breaker to me, alas.
>>
>> thanks in advance.
>>
>> Cheers,
>> Stephen.
>>
>> On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague 
>> wrote:
>>
>> hi guys,
>> wondering where we stand with Hive On Spark these days?
>>
>> i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental
>> versions) and running up against this class not found:
>>
>> java.lang. NoClassDefFoundError: org/apache/spark/ JavaSparkListener
>>
>>
>> searching the Cyber i find this:
>> 1. http://stackoverflow.com/ questions/41953688/setting-
>> spark-as-default-execution- engine-for-hive
>> 
>>
>> which pretty much describes my situation too and it references this:
>>
>>
>> 2. https://issues.apache.org/ jira/browse/SPARK-17563
>> 
>>
>> which indicates a "won't fix" - but does reference this:
>>
>>
>> 3. https://issues.apache.org/ jira/browse/HIVE-14029
>> 
>>
>> which looks to be fixed in hive 2.2 - which is not released yet.
>>
>>
>> so if i want to use spark 2.1.0 with hive am i out of luck - until hive
>> 2.2?
>>
>> thanks,
>> Stephen.
>>
>>
>>
>>
>>
> Stephan,
>
> I understand some of your frustration.  Remember that many in open source
> are volunteering their time. This is why if you pay a vendor for support of
> some software you might pay 50K a year or $200.00 an hour. If I was your
> vendor/consultant I would have started the clock 10 minutes ago just to
> answer this email :). The only "pay" I ever got from Hive is that I can use
> it as a resume bullet point, and I wrote a book which pays me royalties.
>
> As it relates specifically to your problem, when you see the trends you
> are seeing it probably means you are in a minority of the user base. Either
> your doing something no one else is doing, you are too cutting edge, or no
> one has an easy solution. Hive is making the move from the classic
> MapReduce, two other execution engines have been made Tez and HiveOnSpark.
> Because we 

Re: hive on spark - version question

2017-03-17 Thread Edward Capriolo
On Fri, Mar 17, 2017 at 2:56 PM, hernan saab 
wrote:

> I have been in a similar world of pain. Basically, I tried to use an
> external Hive to have user access controls with a spark engine.
> At the end, I realized that it was a better idea to use apache tez instead
> of a spark engine for my particular case.
>
> But the journey is what I want to share with you.
> The big data apache tools and libraries such as Hive, Tez, Spark, Hadoop ,
> Parquet etc etc are not interchangeable as we would like to think. There
> are very limited combinations for very specific versions. This is why tools
> like Ambari can be useful. Ambari sets a path of combos of versions known
> to work and the dirty work is done under the UI.
>
> More often than not, when you try a version that few people tried, you
> will get error messages that will derailed you and cause you to waste a lot
> of time.
>
> In addition, this group, as well as many other apache big data user
> groups,  provides extremely poor support for users. The answers you usually
> get are not even hints to a solution. Their answers usually translate to
> "there is nothing I am willing to do about your problem. If I did, I should
> get paid" in many cryptic ways.
>
> If you ask your question to the Spark group they will take you to the Hive
> group and viceversa (I can almost guarantee it based on previous
> experiences)
>
> But in hindsight, people who work on this kinds of things typically make
> more money that the average developers. If you make more $$s it makes sense
> learning this stuff is supposed to be harder.
>
> Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive  if
> you are querying large files.
>
>
>
> On Friday, March 17, 2017 11:33 AM, Stephen Sprague 
> wrote:
>
>
> :(  gettin' no love on this one.   any SME's know if Spark 2.1.0 will work
> with Hive 2.1.0 ?  That JavaSparkListener class looks like a deal breaker
> to me, alas.
>
> thanks in advance.
>
> Cheers,
> Stephen.
>
> On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague 
> wrote:
>
> hi guys,
> wondering where we stand with Hive On Spark these days?
>
> i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental
> versions) and running up against this class not found:
>
> java.lang. NoClassDefFoundError: org/apache/spark/ JavaSparkListener
>
>
> searching the Cyber i find this:
> 1. http://stackoverflow.com/ questions/41953688/setting-
> spark-as-default-execution- engine-for-hive
> 
>
> which pretty much describes my situation too and it references this:
>
>
> 2. https://issues.apache.org/ jira/browse/SPARK-17563
> 
>
> which indicates a "won't fix" - but does reference this:
>
>
> 3. https://issues.apache.org/ jira/browse/HIVE-14029
> 
>
> which looks to be fixed in hive 2.2 - which is not released yet.
>
>
> so if i want to use spark 2.1.0 with hive am i out of luck - until hive
> 2.2?
>
> thanks,
> Stephen.
>
>
>
>
>
Stephan,

I understand some of your frustration.  Remember that many in open source
are volunteering their time. This is why if you pay a vendor for support of
some software you might pay 50K a year or $200.00 an hour. If I was your
vendor/consultant I would have started the clock 10 minutes ago just to
answer this email :). The only "pay" I ever got from Hive is that I can use
it as a resume bullet point, and I wrote a book which pays me royalties.

As it relates specifically to your problem, when you see the trends you are
seeing it probably means you are in a minority of the user base. Either
your doing something no one else is doing, you are too cutting edge, or no
one has an easy solution. Hive is making the move from the classic
MapReduce, two other execution engines have been made Tez and HiveOnSpark.
Because we are open source we allow people to "scratch an itch" that is the
Apache way. From time to time in means something that was added stops being
viable because of lack of support.

I agree with your final assessment which is Tez is the most viable engine
for Hive. This is by no means a put down of the HiveOnSpark work and it
does not mean it will never the most viable. By the same token if the
versions fall out of sync and all that exists is complains the viability
speaks for itself.

Remember that keeping two fast moving things together is no easy chore. I
used to run Hive + cassandra. Seems easy, crap two versions of common CLI,
shade one version everything works, crap new hive release has different
versions of thrift, shade + patch, crap now one of the other dependencies
is incompatible fork + shade + patch. At some point you have to say to
yourself if I can not make critical mass of this solution such that I am
the only one doing/patching it then 

Re: hive on spark - version question

2017-03-17 Thread hernan saab
I have been in a similar world of pain. Basically, I tried to use an external 
Hive to have user access controls with a spark engine.At the end, I realized 
that it was a better idea to use apache tez instead of a spark engine for my 
particular case.
But the journey is what I want to share with you.The big data apache tools and 
libraries such as Hive, Tez, Spark, Hadoop , Parquet etc etc are not 
interchangeable as we would like to think. There are very limited combinations 
for very specific versions. This is why tools like Ambari can be useful. Ambari 
sets a path of combos of versions known to work and the dirty work is done 
under the UI. 
More often than not, when you try a version that few people tried, you will get 
error messages that will derailed you and cause you to waste a lot of time.
In addition, this group, as well as many other apache big data user groups,  
provides extremely poor support for users. The answers you usually get are not 
even hints to a solution. Their answers usually translate to "there is nothing 
I am willing to do about your problem. If I did, I should get paid" in many 
cryptic ways.
If you ask your question to the Spark group they will take you to the Hive 
group and viceversa (I can almost guarantee it based on previous experiences)
But in hindsight, people who work on this kinds of things typically make more 
money that the average developers. If you make more $$s it makes sense learning 
this stuff is supposed to be harder.
Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive  if you 
are querying large files.
 

On Friday, March 17, 2017 11:33 AM, Stephen Sprague  
wrote:
 

 :(  gettin' no love on this one.   any SME's know if Spark 2.1.0 will work 
with Hive 2.1.0 ?  That JavaSparkListener class looks like a deal breaker to 
me, alas.

thanks in advance.

Cheers,
Stephen.

On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague  wrote:

hi guys,
wondering where we stand with Hive On Spark these days?

i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental versions) 
and running up against this class not found:

java.lang. NoClassDefFoundError: org/apache/spark/ JavaSparkListener


searching the Cyber i find this:
    1. http://stackoverflow.com/ questions/41953688/setting- 
spark-as-default-execution- engine-for-hive

    which pretty much describes my situation too and it references this:


    2. https://issues.apache.org/ jira/browse/SPARK-17563

    which indicates a "won't fix" - but does reference this:


    3. https://issues.apache.org/ jira/browse/HIVE-14029

    which looks to be fixed in hive 2.2 - which is not released yet.


so if i want to use spark 2.1.0 with hive am i out of luck - until hive 2.2?

thanks,
Stephen.





   

Re: hive on spark - version question

2017-03-17 Thread Stephen Sprague
:(  gettin' no love on this one.   any SME's know if Spark 2.1.0 will work
with Hive 2.1.0 ?  That JavaSparkListener class looks like a deal breaker
to me, alas.

thanks in advance.

Cheers,
Stephen.

On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague 
wrote:

> hi guys,
> wondering where we stand with Hive On Spark these days?
>
> i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental
> versions) and running up against this class not found:
>
> java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener
>
>
> searching the Cyber i find this:
> 1. http://stackoverflow.com/questions/41953688/setting-
> spark-as-default-execution-engine-for-hive
>
> which pretty much describes my situation too and it references this:
>
>
> 2. https://issues.apache.org/jira/browse/SPARK-17563
>
> which indicates a "won't fix" - but does reference this:
>
>
> 3. https://issues.apache.org/jira/browse/HIVE-14029
>
> which looks to be fixed in hive 2.2 - which is not released yet.
>
>
> so if i want to use spark 2.1.0 with hive am i out of luck - until hive
> 2.2?
>
> thanks,
> Stephen.
>
>