subject:"Need Help in Spark Hive Data Processing"

Need Help in Spark Hive Data Processing

2016-01-06 Thread Balaraju.Kagidala Kagidala

Hi ,

  I am new user to spark. I am trying to use Spark to process huge Hive
data using Spark DataFrames.


I have 5 node Spark cluster each with 30 GB memory. i am want to process
hive table with 450GB data using DataFrames. To fetch single row from Hive
table its taking 36 mins. Pls suggest me what wrong here and any help is
appreciated.


Thanks
Bala

Re: Need Help in Spark Hive Data Processing

2016-01-06 Thread Jeff Zhang

It depends on how you fetch the single row. Does your query complex ?

On Thu, Jan 7, 2016 at 12:47 PM, Balaraju.Kagidala Kagidala <
balaraju.kagid...@gmail.com> wrote:

> Hi ,
>
>   I am new user to spark. I am trying to use Spark to process huge Hive
> data using Spark DataFrames.
>
>
> I have 5 node Spark cluster each with 30 GB memory. i am want to process
> hive table with 450GB data using DataFrames. To fetch single row from Hive
> table its taking 36 mins. Pls suggest me what wrong here and any help is
> appreciated.
>
>
> Thanks
> Bala
>
>
>


-- 
Best Regards

Jeff Zhang

Re: Need Help in Spark Hive Data Processing

2016-01-06 Thread Jörn Franke

You need the table in an efficient format, such as Orc or parquet. Have the 
table sorted appropriately (hint: most discriminating column in the where 
clause). Do not use SAN or virtualization for the slave nodes.

Can you please post your query.

I always recommend to avoid single updates where possible. They are very 
inefficient for analytics scenarios - this is somehow also true for the 
traditional database world (depends on the use case of course).

> On 07 Jan 2016, at 05:47, Balaraju.Kagidala Kagidala 
>  wrote:
> 
> Hi ,
> 
>   I am new user to spark. I am trying to use Spark to process huge Hive data 
> using Spark DataFrames.
> 
> 
> I have 5 node Spark cluster each with 30 GB memory. i am want to process hive 
> table with 450GB data using DataFrames. To fetch single row from Hive table 
> its taking 36 mins. Pls suggest me what wrong here and any help is 
> appreciated.
> 
> 
> Thanks
> Bala
> 
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Need Help in Spark Hive Data Processing

Re: Need Help in Spark Hive Data Processing

Re: Need Help in Spark Hive Data Processing

3 matches

Site Navigation

Mail list logo

Footer information