Hi Ayan,
, thanks for the explanation,
I am aware of compression codecs.
How does locality level set?
Is it done by Spark or yarn?
Please let me know,
Thanks,
Yesh
On Nov 22, 2016 5:13 PM, "ayan guha" wrote:
Hi
RACK_LOCAL = Task running on the same rack but not on
Hi
RACK_LOCAL = Task running on the same rack but not on the same node where
data is
NODE_LOCAL = task and data is co-located. Probably you were looking for
this one?
GZIP - Read is through GZIP codec, but because it is non-splittable, so you
can have atmost 1 task reading a gzip file. Now, the
Hi Ayan,
we have default rack topology.
-Yeshwanth
Can you Imagine what I would do if I could do all I can - Art of War
On Tue, Nov 22, 2016 at 6:37 AM, ayan guha wrote:
> Because snappy is not splittable, so single task makes sense.
>
> Are sure about rack topology?
Because snappy is not splittable, so single task makes sense.
Are sure about rack topology? Ie 225 is in a different rack than 227 or
228? What does your topology file says?
On 22 Nov 2016 10:14, "yeshwanth kumar" wrote:
> Thanks for your reply,
>
> i can definitely
Thanks for your reply,
i can definitely change the underlying compression format.
but i am trying to understand the Locality Level,
why executor ran on a different node, where the blocks are not present,
when Locality Level is RACK_LOCAL
can you shed some light on this.
Thanks,
Yesh
Use as a format orc, parquet or avro because they support any compression type
with parallel processing. Alternatively split your file in several smaller
ones. Another alternative would be bzip2 (but slower in general) or Lzo
(usually it is not included by default in many distributions).
> On
Try changing compression to bzip2 or lzo. For reference -
http://comphadoop.weebly.com
Thanks,
Aniket
On Mon, Nov 21, 2016, 10:18 PM yeshwanth kumar
wrote:
> Hi,
>
> we are running Hive on Spark, we have an external table over snappy
> compressed csv file of size 917.4 M
Hi,
we are running Hive on Spark, we have an external table over snappy
compressed csv file of size 917.4 M
HDFS block size is set to 256 MB
as per my Understanding, if i run a query over that external table , it
should launch 4 tasks. one for each block.
but i am seeing one executor and one