Re: RDD Partitions on HDFS file in Hive on Spark Query

2016-11-22 Thread yeshwanth kumar
ream, gzip codec will uncompress the data. This is really is not a spark thing, but a hadoop input format discussion HTH? On Wed, Nov 23, 2016 at 10:00 AM, yeshwanth kumar <yeshwant...@gmail.com> wrote: > Hi Ayan, > > we have default rack topology. > > > > -Yeshwanth > Can

Re: RDD Partitions on HDFS file in Hive on Spark Query

2016-11-22 Thread yeshwanth kumar
t rack topology? Ie 225 is in a different rack than 227 or > 228? What does your topology file says? > On 22 Nov 2016 10:14, "yeshwanth kumar" <yeshwant...@gmail.com> wrote: > >> Thanks for your reply, >> >> i can definitely change the underlying compression f

Re: RDD Partitions on HDFS file in Hive on Spark Query

2016-11-21 Thread yeshwanth kumar
e in several > smaller ones. Another alternative would be bzip2 (but slower in general) or > Lzo (usually it is not included by default in many distributions). > > On 21 Nov 2016, at 23:17, yeshwanth kumar <yeshwant...@gmail.com> wrote: > > Hi, > > we are running Hiv

RDD Partitions on HDFS file in Hive on Spark Query

2016-11-21 Thread yeshwanth kumar
Hi, we are running Hive on Spark, we have an external table over snappy compressed csv file of size 917.4 M HDFS block size is set to 256 MB as per my Understanding, if i run a query over that external table , it should launch 4 tasks. one for each block. but i am seeing one executor and one

How to generate a sequential key in rdd across executors

2016-07-23 Thread yeshwanth kumar
Hi, i am doing bulk load to hbase using spark, in which i need to generate a sequential key for each record, the key should be sequential across all the executors. i tried zipwith index, didn't worked because zipwith index gives index per executor not across all executors. looking for some

Spark HBase bulk load using hfile format

2016-07-13 Thread yeshwanth kumar
Hi i am doing bulk load into HBase as HFileFormat, by using saveAsNewAPIHadoopFile when i try to write i am getting an exception java.io.IOException: Added a key not lexically larger than previous. following is the code snippet case class HBaseRow(rowKey: ImmutableBytesWritable, kv: KeyValue)