Need Help in Spark Hive Data Processing
Hi , I am new user to spark. I am trying to use Spark to process huge Hive data using Spark DataFrames. I have 5 node Spark cluster each with 30 GB memory. i am want to process hive table with 450GB data using DataFrames. To fetch single row from Hive table its taking 36 mins. Pls suggest me what wrong here and any help is appreciated. Thanks Bala
Re: Need Help in Spark Hive Data Processing
It depends on how you fetch the single row. Does your query complex ? On Thu, Jan 7, 2016 at 12:47 PM, Balaraju.Kagidala Kagidala < balaraju.kagid...@gmail.com> wrote: > Hi , > > I am new user to spark. I am trying to use Spark to process huge Hive > data using Spark DataFrames. > > > I have 5 node Spark cluster each with 30 GB memory. i am want to process > hive table with 450GB data using DataFrames. To fetch single row from Hive > table its taking 36 mins. Pls suggest me what wrong here and any help is > appreciated. > > > Thanks > Bala > > > -- Best Regards Jeff Zhang
Re: Need Help in Spark Hive Data Processing
You need the table in an efficient format, such as Orc or parquet. Have the table sorted appropriately (hint: most discriminating column in the where clause). Do not use SAN or virtualization for the slave nodes. Can you please post your query. I always recommend to avoid single updates where possible. They are very inefficient for analytics scenarios - this is somehow also true for the traditional database world (depends on the use case of course). > On 07 Jan 2016, at 05:47, Balaraju.Kagidala Kagidala >wrote: > > Hi , > > I am new user to spark. I am trying to use Spark to process huge Hive data > using Spark DataFrames. > > > I have 5 node Spark cluster each with 30 GB memory. i am want to process hive > table with 450GB data using DataFrames. To fetch single row from Hive table > its taking 36 mins. Pls suggest me what wrong here and any help is > appreciated. > > > Thanks > Bala > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org