Re: Spark SQL vs HiveQL
Thanks for responding BUT I would not be reading from a file if it was Hive. I'm comparing Hive LLAP from a hive table vs Spark SQL from a file. That is the question. Thanks On Mon, Aug 28, 2017 at 1:58 PM, Imran Rajjadwrote: > If reading directly from file then Spark SQL should be your choice > > > On Mon, Aug 28, 2017 at 10:25 PM Michael Artz > wrote: > >> Just to be clear, I'm referring to having Spark reading from a file, not >> from a Hive table. And it will have tungsten engine off heap serialization >> after 2.1, so if it was a test with like 1.63 it won't be as helpful. >> >> >> On Mon, Aug 28, 2017 at 10:50 AM, Michael Artz >> wrote: >> >>> Hi, >>> There isn't any good source to answer the question if Hive as an >>> SQL-On-Hadoop engine just as fast as Spark SQL now? I just want to know if >>> there has been a comparison done lately for HiveQL vs Spark SQL on Spark >>> versions 2.1 or later. I have a large ETL process, with many table joins >>> and some string manipulation. I don't think anyone has done this kind of >>> testing in a while. With Hive LLAP being so performant, I am trying to >>> make the case for using Spark and some of the architects are light on >>> experience so they are scared of Scala. >>> >>> Thanks >>> >>> >>> >> >> >> -- > Sent from Gmail Mobile >
Re: Spark SQL vs HiveQL
If reading directly from file then Spark SQL should be your choice On Mon, Aug 28, 2017 at 10:25 PM Michael Artzwrote: > Just to be clear, I'm referring to having Spark reading from a file, not > from a Hive table. And it will have tungsten engine off heap serialization > after 2.1, so if it was a test with like 1.63 it won't be as helpful. > > > On Mon, Aug 28, 2017 at 10:50 AM, Michael Artz > wrote: > >> Hi, >> There isn't any good source to answer the question if Hive as an >> SQL-On-Hadoop engine just as fast as Spark SQL now? I just want to know if >> there has been a comparison done lately for HiveQL vs Spark SQL on Spark >> versions 2.1 or later. I have a large ETL process, with many table joins >> and some string manipulation. I don't think anyone has done this kind of >> testing in a while. With Hive LLAP being so performant, I am trying to >> make the case for using Spark and some of the architects are light on >> experience so they are scared of Scala. >> >> Thanks >> >> >> > > > -- Sent from Gmail Mobile
Re: Spark SQL vs HiveQL
Just to be clear, I'm referring to having Spark reading from a file, not from a Hive table. And it will have tungsten engine off heap serialization after 2.1, so if it was a test with like 1.63 it won't be as helpful. On Mon, Aug 28, 2017 at 10:50 AM, Michael Artzwrote: > Hi, > There isn't any good source to answer the question if Hive as an > SQL-On-Hadoop engine just as fast as Spark SQL now? I just want to know if > there has been a comparison done lately for HiveQL vs Spark SQL on Spark > versions 2.1 or later. I have a large ETL process, with many table joins > and some string manipulation. I don't think anyone has done this kind of > testing in a while. With Hive LLAP being so performant, I am trying to > make the case for using Spark and some of the architects are light on > experience so they are scared of Scala. > > Thanks >