[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-15 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 9: Code-Review+2

Carrying +2.


--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 15 Nov 2017 20:31:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-15 Thread Bharath Vissapragada (Code Review)
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#9).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=6

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 86 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/9
--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-15 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h@381
PS8, Line 381:  both kinds of subscriber up
> or just say "in Unix time"
Done


http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc@415
PS8, Line 415: Id& registration_id, share
> Seems like this should just be "FindSubscriber()" or "FindRegisteredSubscri
FindSubscriber() sounds better.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 15 Nov 2017 20:28:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-14 Thread Bharath Vissapragada (Code Review)
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#8).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=6

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 88 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/8
--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 8
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h@394
PS7, Line 394:   typedef std::pair 
ScheduledSubscriberUpdate;
> I meant flatten both pairs -- i.e. turn ScheduledSubscriberUpdate into a st
oops sorry, got confused because your comment just highlighted the second part 
of the pair. Redid this, makes more sense to flatten the whole thing.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 15 Nov 2017 02:27:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h@383
PS6, Line 383: std::pair
> once we have two level pair, I think it's time to start naming the fields.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 14 Nov 2017 00:18:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-13 Thread Bharath Vissapragada (Code Review)
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#7).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=6

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 91 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/7
--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-08 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 6: Code-Review+1

Carrying +1 (Thanks Sailesh). Any volunteers for a +2 review, thanks.


--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 08 Nov 2017 21:08:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-06 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: te;
> I was chatting with Michael and he mentioned that in general we don't recom
Discussed this a little more with Dimitris, leaving it as-is for now. We didn't 
want to extend the usage of shared/weak_ptrs for readability sake.


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278
PS3, Line 278:   lock_guard l(subscribers_lock_);
 :   lock_guard t(topic_lock_);
> My general opinion on this is "if it ain't broke, don't fix it". Do we have
Fair point, I'll revert the spinlock change. Maybe we can address it again if 
it really turns out to be a bottleneck.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 06 Nov 2017 23:55:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-06 Thread Bharath Vissapragada (Code Review)
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#5).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=6

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 76 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/5
--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-06 Thread Bharath Vissapragada (Code Review)
Hello Sailesh Mukil, Dimitris Tsirogiannis,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#4).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=6

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 81 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/4
--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 4
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-06 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: std::pair
> I think the code would be much simpler if you stored a pointer (probably a
Wouldn't that keep a bunch of unregistered 'Subscriber' objects around due to 
shared_ptr references? I agree the code might be simple though.


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278
PS3, Line 278:   lock_guard l(subscribers_lock_);
 :   lock_guard t(topic_lock_);
> I just noticed this. Getting a SpinLock before getting a mutex is an anti-p
IMO, we shouldn't use spinlock for topic_lock_ since we can potentially do some 
heavy work in GatherTopicUpdates().  If this is an anti-pattern I'm ok 
reverting the change to a mutex. May we can ask others opinions on it?

Dimitris/Dan/Alex do you have any opinion on this?


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@415
PS3, Line 415: const TUniqueId&
> const RegistrationId&
Changed at other places too.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 06 Nov 2017 20:38:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

2017-11-03 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: PublishFilter() continues to run after query 
failure/cancellation
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 21:06:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-02 Thread Bharath Vissapragada (Code Review)
Hello Sailesh Mukil,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#3).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=6

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 78 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/3
--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation

2017-11-02 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: UpdateFilter()/PublishFilter() continue to run 
after query failure/cancellation
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc@394
PS1, Line 394:   if (!status_.ok()) return;
same


http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1082
PS1, Line 1082:   if (!query_status_.ok()) return;
lock_?



--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Fri, 03 Nov 2017 06:44:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-02 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h@165
PS2, Line 165:   typedef TUniqueId RegistrationId;
> You can move this typedef to statestore.h and use the same type in statesto
Done


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383
PS2, Line 383: SubscriberId
>  Where ever the 'SubscriberId' is
 >  required, we just get it from the Subscriber object anyway, and
 >  that object can be retrieved using the unique registration id.

Not sure I follow. If you see OfferUpdate()/DoSubscriberUpdate(), we only keep 
track of ScheduledSubscriberUpdate objects and get the corresponding Subscriber 
based on ScheduledSubscriberUpdate.subscriber_Id. So we don't have a Subscriber 
object handy to get the subscriber_id in all cases.

Also, if we remove SubscriberId everywhere, this means that we need to change 
the subscribers_ map structure to map from RegistrationId -> Subscriber 
objects. Doing so we can't look up by subscriber_id, which is required in 
RegisterSubscriber() (At that point, we don't assign a RegistrationId yet to 
the new instance)

SubscriberMap::iterator subscriber_it = subscribers_.find(subscriber_id);
if (subscriber_it != subscribers_.end()) {
  UnregisterSubscriber(subscriber_it->second.get());
}

We can still figure out a way, but it seemed unnecessarily complex to me. 
Thoughts?


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@481
PS2, Line 481: subscriber exists
> Could you clarify what "exists" means here exactly? It could be confused wi
Done


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@482
PS2, Line 482: registration_ids
> nit: registration_id
Done


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@484
PS2, Line 484: std::shared_ptr* subscriber
> Add a comment about what is returned in this out parameter.
clarified


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414
PS2, Line 414: onst SubscriberId& subscriber_id,
 : const TUniqueId& registration_id
> It looks like it just makes sense to pass 'const Subscriber&' here? Is ther
Not sure I understand. We get the subscriber/registration_id from the 
ScheduledSubscriberUpdate object and not the Subscriber object. Do you mean we 
should pass directly pass ScheduledSubscriberUpdate instead? If thats the case, 
the signature seems kinda weird :)


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@634
PS2, Line 634:   if (!RegisteredSubscriberExists(subscriber_to_update.first, 
subscriber_to_update.second,
> I'm a little worried that we're contending for a mutex two more times in th
Good point. I too think (theoretically) spinlock is probably a better choice to 
avoid context switching. Also, ~1000 entries seem like a reasonable estimate 
for foreseeable future :). Also, I think for string based hashing of ~1000 
entries, it is reasonable to assume a O(1) average case lookup (even though the 
worst case is O(N)).

I created a microbenchmark to see if the lock type makes a difference. In the 
benchmark, I measured how long it takes to get 100 heartbeats for a given 
subscriber (with heart beating thread pool sizes of 10/100).
I didn't see any noticeable difference for 100 subscribers, but beyond that, 
the test runs into flaky socket connection issues. I admit that this is not 
representative of the real world use case because in real clusters, the 
statestore CPU would be much busier and the context switching could be more 
expensive.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 03 Nov 2017 06:22:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8449


Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=6

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 65 insertions(+), 33 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/2
--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17
PS1, Line 17: whereas the web UI would previously block on the lock.
> Looking over https://gerrit.cloudera.org/c/6707/12/be/src/service/impala-se
Yep, we take the lock if the query is not in a CREATED state.

--
Status ImpalaServer::GetRuntimeProfileStr(const TUniqueId& query_id,
const string& user, bool base64_encoded, stringstream* output) {
  DCHECK(output != nullptr);
  // Search for the query id in the active query map
  {
shared_ptr request_state = 
GetClientRequestState(query_id);
if (request_state.get() != nullptr) {
  // For queries in CREATED state, the profile information isn't populated 
yet.
  if (request_state->query_state() == beeswax::QueryState::CREATED) {
return Status::Expected("Query plan is not ready.");
  }
  lock_guard l(*request_state->lock());
--

I agree this is a very important change from a usability/supportability 
perspective, to have a partial profile visible to the user.


http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@26
PS1, Line 26: know the query id
: that's necessary to call GetResultSetMetadata()
> Yes, IMPALA-2568 is one of the eventual goals. This is actually listed as t
I was thinking more along the "profile copy" path Dan suggested, so that we can 
avoid removing the lock during planning. I thought the latter involved more 
work since it could affect cancellations. With the current patch, cancellations 
don't block anymore. when the query is planning, so I think we should make sure 
to get that part right. I'm fine with this approach, if the cancellation works 
without any issues.



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 01 Nov 2017 18:41:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-10-31 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17
PS1, Line 17: whereas the web UI would previously block on the lock.
After IMPALA-1972, it doesn't block, it just returns an empty profile.


http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@26
PS1, Line 26: know the query id
: that's necessary to call GetResultSetMetadata()
It looks to me like this is against IMPALA-2568. I guess we eventually want to 
make ExecuteStatement() async, rather than blocking on planning/metadata load. 
We already hit that problem with clients like Hue timing out while loading 
metadata for large tables. If we eventually want to move in that direction, I 
think these changes need to be undone then.



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 31 Oct 2017 23:42:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-31 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8426 )

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8426/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8426/2//COMMIT_MSG@9
PS2, Line 9: and HBase configuration of our
Update?



--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Tue, 31 Oct 2017 16:52:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8426 )

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..


Patch Set 1: Code-Review+2

(1 comment)

Just curious, how does this change affect the total runtime of the 
core/exhaustive test-suite? +2'ing since the change is simple.

http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template
File fe/src/test/resources/hbase-site.xml.template:

http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template@38
PS1, Line 38: 
:   
: dfs.namenode.replication.min
: 3
:   
Isn't this redundant? Per my understanding, this only goes into HDFS service 
side settings and clients (HBase in this case) only control dfs.replication.



--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Tue, 31 Oct 2017 05:11:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6124: Fix alter table ddl updates and test

2017-10-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8411 )

Change subject: IMPALA-6124: Fix alter table ddl updates and test
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf
Gerrit-Change-Number: 8411
Gerrit-PatchSet: 7
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Tue, 31 Oct 2017 00:08:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6124: Fix alter table ddl updates and test

2017-10-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8411 )

Change subject: IMPALA-6124: Fix alter table ddl updates and test
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8411/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8411/6//COMMIT_MSG@10
PS6, Line 10: makes
remove


http://gerrit.cloudera.org:8080/#/c/8411/6//COMMIT_MSG@11
PS6, Line 11: .
..to be consistent with Hive...?


http://gerrit.cloudera.org:8080/#/c/8411/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8411/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@a2784
PS6, Line 2784:
Just for my understanding, this change is only for add/drop partitions right? 
And not for other applyAlterPart() calls.



--
To view, visit http://gerrit.cloudera.org:8080/8411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf
Gerrit-Change-Number: 8411
Gerrit-PatchSet: 6
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 30 Oct 2017 23:58:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5638: [DOCS] Add known issue for Impala-Kudu-Sentry issue

2017-10-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8421 )

Change subject: IMPALA-5638: [DOCS] Add known issue for Impala-Kudu-Sentry issue
..


Patch Set 1: Code-Review+2

Apache announcement is already out (CVE-2017-9792)


--
To view, visit http://gerrit.cloudera.org:8080/8421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93e99aec2fcbc12f94678e60ebb9d150e72fc77d
Gerrit-Change-Number: 8421
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Comment-Date: Mon, 30 Oct 2017 23:33:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6124: Fix alter table ddl updates and test

2017-10-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8411 )

Change subject: IMPALA-6124: Fix alter table ddl updates and test
..


Patch Set 3:

Ah, lol ya, (after-before) used to be < 0. Got it.


--
To view, visit http://gerrit.cloudera.org:8080/8411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf
Gerrit-Change-Number: 8411
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 30 Oct 2017 19:29:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6124: Fix alter table ddl updates and test

2017-10-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8411 )

Change subject: IMPALA-6124: Fix alter table ddl updates and test
..


Patch Set 3: Code-Review+1

(1 comment)

The patch looks good to me, but I'm still curious how it worked before. Tried 
digging into the HMS code to see how the ddlTime is updated, but no clues.

http://gerrit.cloudera.org:8080/#/c/8411/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8411/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2769
PS3, Line 2769:  catalog_.updateLastDdlTime(
  :   new TTableName(msTbl.getDbName(), 
msTbl.getTableName()), lastDdlTime);
Unrelated to this test failure, any idea why we do this if the alter fails?



--
To view, visit http://gerrit.cloudera.org:8080/8411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf
Gerrit-Change-Number: 8411
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 30 Oct 2017 19:02:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options

2017-10-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8401 )

Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml
File docs/topics/impala_ssl.xml:

http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml@158
PS1, Line 158: tlsv1: Allow any TLS version of 1.0 
or higher.
mention this the default may be?



--
To view, visit http://gerrit.cloudera.org:8080/8401
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e
Gerrit-Change-Number: 8401
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 22:06:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 11: Code-Review+2

Rebased, carrying +2. Thanks Dimitris.


--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 11
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 26 Oct 2017 18:07:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-25 Thread Bharath Vissapragada (Code Review)
Hello Jim Apple, Dimitris Tsirogiannis, Mostafa Mokhtar, Alex Behm, Vuk 
Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8235

to look at the new patch set (#10).

Change subject: IMPALA-5429: Multi threaded block metadata loading
..

IMPALA-5429: Multi threaded block metadata loading

Implements multi threaded block metadata loading on the Catalog
server where we fetch block metadata for multiple partitions of a
single table in parallel. Number of threads to load the metadata is
controlled by the following two parameters (set on the Catalog server
startup and applies for each table load)

-max_hdfs_partitions_parallel_load(default=5)
-max_nonhdfs_partitions_parallel_load(default=20)

We use different thread pool sizes for HDFS and non-HDFS tables since
non-HDFS supports much higher throughput of RPC calls for listStatus
/listFiles. Based on our experiments, S3 showed a linear speed up
(up to ~113x) with increasing number of loading threads where as the
HDFS throughput was limited to ~5x in un-secure clusters and up to
~3.7x in secure clusters. We narrowed it down to scalability
bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the
server and the client side.

One thing to note here is that the thread pool based metadata fetching
is implemented only for loading HDFS block metadata and not for loading
HMS partition information. Our experiments showed that while loading
large partitioned tables, ~90% of the time is spent in connecting to NN
and loading the HDFS block information and optimizing the rest ~10% makes
the code unnecessarily complex without much gain.

Additional notes:

- The multithreading approach is implemented for
  * INVALIDATE (loading from scratch),
  * REFRESH (reusing existing md) code paths,
  * ALTER TABLE ADD/RECOVER PARTITIONS.

- This patch makes the implementation of ListMap thread-safe since
we use that data structure as a shared state between multiple partition
metadata loding threads.

Testing and Results:

- This patch doesn't add any new tests since there is enough test
coverage already. Passed core/exhaustive runs with HDFS/S3.

- We noticed up to ~113x speedup on S3 tables(thread_pool_size=160)
and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure
HDFS clusters.

- Synthesized the following two large tables on HDFS and S3 and noticed
significant reduction in my test DDL queries.

  (1) 100K partitions + 1 million files
  (2) 80 partitions + 250K files

 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4%
 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25%
 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57%
 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87%
 80-PARTITIONS-250K-FILES-09-INVALIDATE   I -24.88%
 80-PARTITIONS-250K-FILES-03-RECOVER  I -35.90%
 80-PARTITIONS-250K-FILES-07-REFRESH  I -43.03%
 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS  I -43.93%
 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV   I -46.59%
 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION  I -48.71%
 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH   I -49.02%
 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV  I -49.05%
 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87%
 80-PARTITIONS-250K-FILES-S3-03-RECOVER   I -67.17%
 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV   I -76.45%
 80-PARTITIONS-250K-FILES-S3-07-REFRESH   I -87.04%
 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57%

Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/ListMap.java
9 files changed, 462 insertions(+), 242 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/10
--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 10
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 9:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@42
PS9, Line 42: datastructure
> nit: data structure
Done


http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@50
PS9, Line 50: speed up
> speedup
I see conflicting results. As per webster it is "speedup", but as per Oxford it 
is "speed-up". Now I don't know who to trust, but I'll go with your suggestion 
:)

https://en.oxforddictionaries.com/definition/speed-up
https://www.merriam-webster.com/dictionary/speedup


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122
PS9, Line 122: Limits the
> "Maximum"
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@223
PS9, Line 223: public FileMetadataLoadStats(Path path) { hdfsPath = path; }
> nit: move the ctor below the class variable members.
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@228
PS9, Line 228:
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@232
PS9, Line 232: the
> remove
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248
PS9, Line 248: runnable
> callable
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@266
PS9, Line 266: // Computes the file metadata from scratch (by calling 
resetAndLoadFileMetadata())
 : // when reuseFileMd_ is false, else refreshes the existing 
file metadata for
 : // modified files using refreshFileMetadata().
> nit: you can actually remove the comment as it simply describes the code be
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@465
PS9, Line 465: hasFileChanged
> I think you need to put back the caching check. Alex made a good point that
Alex to the rescue, good point. Made the change.


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773
PS9, Line 773: loadParitionFileMetadataFromScratch
> "load from scratch" and "reset and load" essentially mean the same thing. M
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792
PS9, Line 792: S3
> non-HDFS (S3/ADLS)
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792
PS9, Line 792: S3
> "the latter"
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@844
PS9, Line 844: if (LOG.isTraceEnabled()) {
 : LOG.trace(loadStats.debugString());
 :   }
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java
File fe/src/main/java/org/apache/impala/service/BackendConfig.java:

http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java@71
PS9, Line 71: maxS3PartsParallelLoad
> Why "maxS3..."? Everywhere else we use "maxNonHdfs..".
Yep, forgot to change here.


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java
File fe/src/main/java/org/apache/impala/util/ListMap.java:

http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@43
PS9, Line 43: getList() { return list_; }
> Hm, why is list_ exposed to the world? You may want to check who wants this
- I checked the callers and all of them just readers and either iterate / sets 
it inside another thrift object (which is when I think a copy is made).

- Looks like ImmutableList.copyOf() doesn't always make a copy (although 
subject to change), using it here.


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@76
PS9, Line 76: synchronized (this) {
> this is equivalent to public synchronized void populate...
oops, yea, updated.



--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada

[Impala-ASF-CR] IMPALA-3998: deprecate --refresh after connect

2017-10-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8381 )

Change subject: IMPALA-3998: deprecate --refresh_after_connect
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id297f80c0f596a69ef8ecde948812b82d2a5c0fa
Gerrit-Change-Number: 8381
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 25 Oct 2017 18:13:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 9: Code-Review+1

(1 comment)

Thanks Alex, carrying +1. Rebased for the final review.

http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122
PS8, Line 122:   // Limits the number of errors logged when loading partitioned 
tables.
> remove "huge"
Done



--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 25 Oct 2017 17:46:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-25 Thread Bharath Vissapragada (Code Review)
Hello Jim Apple, Dimitris Tsirogiannis, Mostafa Mokhtar, Alex Behm, Vuk 
Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8235

to look at the new patch set (#9).

Change subject: IMPALA-5429: Multi threaded block metadata loading
..

IMPALA-5429: Multi threaded block metadata loading

Implements multi threaded block metadata loading on the Catalog
server where we fetch block metadata for multiple partitions of a
single table in parallel. Number of threads to load the metadata is
controlled by the following two parameters (set on the Catalog server
startup and applies for each table load)

-max_hdfs_partitions_parallel_load(default=5)
-max_nonhdfs_partitions_parallel_load(default=20)

We use different thread pool sizes for HDFS and non-HDFS tables since
non-HDFS supports much higher throughput of RPC calls for listStatus
/listFiles. Based on our experiments, S3 showed a linear speed up
(up to ~113x) with increasing number of loading threads where as the
HDFS throughput was limited to ~5x in un-secure clusters and up to
~3.7x in secure clusters. We narrowed it down to scalability
bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the
server and the client side.

One thing to note here is that the thread pool based metadata fetching
is implemented only for loading HDFS block metadata and not for loading
HMS partition information. Our experiments showed that while loading
large partitioned tables, ~90% of the time is spent in connecting to NN
and loading the HDFS block information and optimizing the rest ~10% makes
the code unnecessarily complex without much gain.

Additional notes:

- The multithreading approach is implemented for
  * INVALIDATE (loading from scratch),
  * REFRESH (reusing existing md) code paths,
  * ALTER TABLE ADD/RECOVER PARTITIONS.

- This patch makes the implementation of ListMap thread-safe since
we use that datastructure as a shared state between multiple partition
metadata loding threads.

Testing and Results:

- This patch doesn't add any new tests since there is enough test
coverage already. Passed core/exhaustive runs with HDFS/S3.

- We noticed up to ~113x speed up on S3 tables(thread_pool_size=160)
and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure
HDFS clusters.

- Synthesized the following two large tables on HDFS and S3 and noticed
significant reduction in my test DDL queries.

  (1) 100K partitions + 1 million files
  (2) 80 partitions + 250K files

 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4%
 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25%
 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57%
 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87%
 80-PARTITIONS-250K-FILES-09-INVALIDATE   I -24.88%
 80-PARTITIONS-250K-FILES-03-RECOVER  I -35.90%
 80-PARTITIONS-250K-FILES-07-REFRESH  I -43.03%
 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS  I -43.93%
 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV   I -46.59%
 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION  I -48.71%
 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH   I -49.02%
 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV  I -49.05%
 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87%
 80-PARTITIONS-250K-FILES-S3-03-RECOVER   I -67.17%
 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV   I -76.45%
 80-PARTITIONS-250K-FILES-S3-07-REFRESH   I -87.04%
 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57%

Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/ListMap.java
9 files changed, 461 insertions(+), 245 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/9
--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR](asf-site) Fix broken link to CIDR paper

2017-10-23 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8365 )

Change subject: Fix broken link to CIDR paper
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8365
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I80f222465803e8c8539541df6f11d8c7808aa335
Gerrit-Change-Number: 8365
Gerrit-PatchSet: 1
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Comment-Date: Tue, 24 Oct 2017 06:11:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-22 Thread Bharath Vissapragada (Code Review)
Hello Jim Apple, Dimitris Tsirogiannis, Mostafa Mokhtar, Alex Behm, Vuk 
Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8235

to look at the new patch set (#8).

Change subject: IMPALA-5429: Multi threaded block metadata loading
..

IMPALA-5429: Multi threaded block metadata loading

Implements multi threaded block metadata loading on the Catalog
server where we fetch block metadata for multiple partitions of a
single table in parallel. Number of threads to load the metadata is
controlled by the following two parameters (set on the Catalog server
startup and applies for each table load)

-max_hdfs_partitions_parallel_load(default=5)
-max_nonhdfs_partitions_parallel_load(default=20)

We use different thread pool sizes for HDFS and non-HDFS tables since
non-HDFS supports much higher throughput of RPC calls for listStatus
/listFiles. Based on our experiments, S3 showed a linear speed up
(up to ~113x) with increasing number of loading threads where as the
HDFS throughput was limited to ~5x in un-secure clusters and up to
~3.7x in secure clusters. We narrowed it down to scalability
bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the
server and the client side.

One thing to note here is that the thread pool based metadata fetching
is implemented only for loading HDFS block metadata and not for loading
HMS partition information. Our experiments showed that while loading
large partitioned tables, ~90% of the time is spent in connecting to NN
and loading the HDFS block information and optimizing the rest ~10% makes
the code unnecessarily complex without much gain.

Additional notes:

- The multithreading approach is implemented for
  * INVALIDATE (loading from scratch),
  * REFRESH (reusing existing md) code paths,
  * ALTER TABLE ADD/RECOVER PARTITIONS.

- This patch makes the implementation of ListMap thread-safe since
we use that datastructure as a shared state between multiple partition
metadata loding threads.

Testing and Results:

- This patch doesn't add any new tests since there is enough test
coverage already. Passed core/exhaustive runs with HDFS/S3.

- We noticed up to ~113x speed up on S3 tables(thread_pool_size=160)
and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure
HDFS clusters.

- Synthesized the following two large tables on HDFS and S3 and noticed
significant reduction in my test DDL queries.

  (1) 100K partitions + 1 million files
  (2) 80 partitions + 250K files

 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4%
 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25%
 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57%
 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87%
 80-PARTITIONS-250K-FILES-09-INVALIDATE   I -24.88%
 80-PARTITIONS-250K-FILES-03-RECOVER  I -35.90%
 80-PARTITIONS-250K-FILES-07-REFRESH  I -43.03%
 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS  I -43.93%
 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV   I -46.59%
 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION  I -48.71%
 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH   I -49.02%
 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV  I -49.05%
 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87%
 80-PARTITIONS-250K-FILES-S3-03-RECOVER   I -67.17%
 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV   I -76.45%
 80-PARTITIONS-250K-FILES-S3-07-REFRESH   I -87.04%
 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57%

Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/ListMap.java
9 files changed, 461 insertions(+), 245 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/8
--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 8
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-22 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@221
PS7, Line 221: // are excluded because they are considered hidden from 
Impala's perspecitve
> Shrink:
Done


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@226
PS7, Line 226: // Number of files skipped while computeing the block 
metadata information.
> Shrink:
Done


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248
PS7, Line 248: // If set to true, reloads the block metadata only when the 
files in this path
> reloads the file metadata (let's try to use the new "file metadata" termino
Done. Changed at other places too.


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@249
PS7, Line 249: // has changed since last load (more details in 
hasFileChanged()).
> have changed
Done


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@785
PS7, Line 785:* much higher throughput of RPC calls for 
listStatus/listFiles. Based on our
> ... RPC calls for listStatus/listFiles. For simplicity, the filesystem type
Done


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@791
PS7, Line 791:* This method is not optimized for tables with mixed 
partition types (partitions mapped
> I don'd think we need this much detail, how about folding a simple sentence
Done


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@794
PS7, Line 794:* This may not work well with the mixed partition types and 
needs to fixed.
> Don't think this is needed.
Done


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@819
PS7, Line 819: // For tables without partitions, we have no block metadata 
to load.
> remove ","
Done


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@841
PS7, Line 841:   LOG.error("Encountered an error loading block metadata 
for table: " +
> Could this flood the log when HDFS is under pressure or there is an intermi
Very good point. This can likely flood the log. IMO we should have some stack 
traces of errors for supportability. Added some code to limit the number of 
errors to log (100) and logged the total failed tasks. Please let me know if 
you prefer some other approach.



--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Sun, 22 Oct 2017 20:05:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-20 Thread Bharath Vissapragada (Code Review)
Hello Jim Apple, Dimitris Tsirogiannis, Mostafa Mokhtar, Alex Behm, Vuk 
Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8235

to look at the new patch set (#7).

Change subject: IMPALA-5429: Multi threaded block metadata loading
..

IMPALA-5429: Multi threaded block metadata loading

Implements multi threaded block metadata loading on the Catalog
server where we fetch block metadata for multiple partitions of a
single table in parallel. Number of threads to load the metadata is
controlled by the following two parameters (set on the Catalog server
startup and applies for each table load)

-max_hdfs_partitions_parallel_load(default=5)
-max_nonhdfs_partitions_parallel_load(default=20)

We use different thread pool sizes for HDFS and non-HDFS tables since
non-HDFS supports much higher throughput of RPC calls for listStatus
/listFiles. Based on our experiments, S3 showed a linear speed up
(up to ~113x) with increasing number of loading threads where as the
HDFS throughput was limited to ~5x in un-secure clusters and up to
~3.7x in secure clusters. We narrowed it down to scalability
bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the
server and the client side.

One thing to note here is that the thread pool based metadata fetching
is implemented only for loading HDFS block metadata and not for loading
HMS partition information. Our experiments showed that while loading
large partitioned tables, ~90% of the time is spent in connecting to NN
and loading the HDFS block information and optimizing the rest ~10% makes
the code unnecessarily complex without much gain.

Additional notes:

- The multithreading approach is implemented for
  * INVALIDATE (loading from scratch),
  * REFRESH (reusing existing md) code paths,
  * ALTER TABLE ADD/RECOVER PARTITIONS.

- This patch makes the implementation of ListMap thread-safe since
we use that datastructure as a shared state between multiple partition
metadata loding threads.

Testing and Results:

- This patch doesn't add any new tests since there is enough test
coverage already. Passed core/exhaustive runs with HDFS/S3.

- We noticed up to ~113x speed up on S3 tables(thread_pool_size=160)
and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure
HDFS clusters.

- Synthesized the following two large tables on HDFS and S3 and noticed
significant reduction in my test DDL queries.

  (1) 100K partitions + 1 million files
  (2) 80 partitions + 250K files

 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4%
 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25%
 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57%
 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87%
 80-PARTITIONS-250K-FILES-09-INVALIDATE   I -24.88%
 80-PARTITIONS-250K-FILES-03-RECOVER  I -35.90%
 80-PARTITIONS-250K-FILES-07-REFRESH  I -43.03%
 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS  I -43.93%
 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV   I -46.59%
 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION  I -48.71%
 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH   I -49.02%
 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV  I -49.05%
 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87%
 80-PARTITIONS-250K-FILES-S3-03-RECOVER   I -67.17%
 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV   I -76.45%
 80-PARTITIONS-250K-FILES-S3-07-REFRESH   I -87.04%
 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57%

Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/ListMap.java
9 files changed, 437 insertions(+), 238 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/7
--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-20 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 6:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@38
PS6, Line 38: DEFINE_int32(max_hdfs_parts_parallel_load, 5,
> Let's spell out "parts" into "partitions" to avoid confusion.
Done


http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@40
PS6, Line 40: "tables. Due to HDFS architectural limitations, it is 
unlikely to get a linear "
> In the sentence "Due to HDFS architectural..." what does the "it" after the
Done


http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@42
PS6, Line 42: DEFINE_int32(max_s3_parts_parallel_load, 20,
> Does ADLS fall into this or the HDFS bucket? What about other filesystems?
Yes, although I haven't tested it specifically with ADLS. I mentioned that in 
the commit message but it looks like a good thing to reflect that in the config 
name too. I changed it too, let me know if you think we should use a better 
name here.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@216
PS6, Line 216:   private class PathStorageMdLoadingStats {
> Naming seems a little weird. I think "Storage" is too generic and "Block" i
Wfm. Changed it.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@240
PS6, Line 240: // If set to true, reloads the block metadata only when the 
underlying file
> Please clarify "the underlying file"
Done


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@413
PS6, Line 413: ++loadStats.ignoredFiles;
> I think we should distinguish the hidden file and already-up-to-date file c
Done


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773
PS6, Line 773:* Returns the thread pool size to load the block metadata of 
this table.
> Command on how we distinguish between HDFS, S3 and other filesystems (using
Done


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@793
PS6, Line 793: return Math.max(Math.min(numPaths, threadPoolSize), 1);
> For my understanding, why is the max() needed here? Is it not be a precondi
For empty tables, numPaths is 0. But yes this should be a preconditions check. 
So I moved it to the caller and added a preconditions check in this method.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@800
PS6, Line 800:* metadata is reloaded, else it is loaded from scratch.
> Difference between "reloaded" and "loaded from scratch" is not very clear.
Sounds better. Done.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@804
PS6, Line 804: int numPathsToLoad = partsByPath.keySet().size();
> partsByPath.size()?
Done


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@812
PS6, Line 812:   for (Path p: partsByPath.keySet()) {
> The tasks are ordered randomly. I wonder if submitting the tasks in a clust
Will do. I have two other jiras (based on others' comments) to create, so will 
add this to the list.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@821
PS6, Line 821: } catch (ExecutionException | InterruptedException e) {
> We still consider the load successful if one of these fails. What do you th
Oops, I forgot to wrap it as a TLE and throw. Thanks for pointing it out. My 
opinion is that a partially loaded table is ok, but we should let the user know 
about it. What do you think?


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@822
PS6, Line 822:   LOG.error("Encountered an error loading block metadata 
for table: " +
> Let's also dump the path
ExecutionException (ex, IOException in this case) includes the path by default 
and LOG.error() logs the full trace?


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@898
PS6, Line 898:* the newly created HdfsPartitions in parallel.
> remove "the"
Done


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java
File fe/src/main/java/org/apache/impala/util/ListMap.java:

http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java@38
PS6, Line 38:   private L

[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-16 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@39
PS6, Line 39: (Advanced) Number of threads used to load block metadata for HDFS 
based partitioned "
: "tables. Due to HDFS architectural limitations, it is 
unlikely to get a linear "
: "speed up beyond 5 threads.
> When multiple tables are loaded, should I think about the total number of t
Unfortunately not. With the current design, this queue is actually unbounded. 
num_metadata_loading_threads only applies to the loads happening via 
TableLoadingMgr class and there would be other loads, that happen via DDLs 
(CatalogOpExecutor) like REFRESHES/ADD PARTITIONS etc. The original plan was to 
use the TableLoadingMgr to queue all these loads but we didn't end up doing it 
since it shows up as a regression to the end users (since the DDLs can 
potentially wait in the queue much longer than before). Ideally we could do it 
and increase num_metadata_loading_threads to a large value to mimic the present 
behavior.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783
PS5, Line 783: numPaths) throws Ca
> CONF is the default configuration and its loaded once upfront for the lifet
getFileSystem() by default uses a cache underneath unless we disable it via

CONF.setBoolean("fs.hdfs.impl.disable.cache", true);

But I'm not totally sure if there is a way to optimize beyond that point for 
each partition.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801
PS5, Line 801:
> yes, that answers it. might be useful to try a workload that has the same n
That'd be a good experiment. We are definitely as fast as the slowest partition 
load. Given we are using a thread pool, smaller partitions give up the 
executing thread much quicker and that would be used by the queued 
partitions_to_load.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@818
PS6, Line 818: for (Future task: pendingMdLoadTasks)
> just for my own info-- since this work is triggered by an end-user, how is
Currently we don't support query cancellation for planning queries 
(IMPALA-915). That is a bigger query life-cycle change and needs to done 
separately.



--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 16 Oct 2017 22:28:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-16 Thread Bharath Vissapragada (Code Review)
Hello Jim Apple, Dimitris Tsirogiannis, Mostafa Mokhtar, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8235

to look at the new patch set (#6).

Change subject: IMPALA-5429: Multi threaded block metadata loading
..

IMPALA-5429: Multi threaded block metadata loading

Implements multi threaded block metadata loading on the Catalog
server where we fetch block metadata for multiple partitions of a
single table in parallel. Number of threads to load the metadata is
controlled by the following two parameters (set on the Catalog server
startup and applies for each table load)

-max_hdfs_parts_parallel_load(default=5)
-max_s3_parts_parallel_load(default=10)

We use different thread pool sizes for HDFS and S3 based tables
since S3 supports much higher throughput of RPC calls for listStatus
/listFiles. Based on our experiments, S3 showed a linear speed up
(up to ~113x) with increasing number of loading threads where as the
HDFS throughput was limited to ~5x in un-secure clusters and up to
~3.7x in secure clusters. We narrowed it down to scalability
bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the
server and the client side.

One thing to note here is that the thread pool based metadata fetching
is implemented only for loading HDFS block metadata and not for loading
HMS partition information. Our experiments showed that while loading
large partitioned tables, ~90% of the time is spent in connecting to NN
and loading the HDFS block information and optimizing the rest ~10% makes
the code unnecessarily complex without much gain.

Additional notes:

- The multithreading approach is implemented for
  * INVALIDATE (loading from scratch),
  * REFRESH (reusing existing md) code paths,
  * ALTER TABLE ADD/RECOVER PARTITIONS.

- This patch makes the implementation of ListMap thread-safe since
we use that datastructure as a shared state between multiple partition
metadata loding threads.

- While the configuration param max_s3_parts_parallel_load says s3, it
applies for any FileSystem implementation that doesn't support storage
IDs (like ADLS).

Testing and Results:

- This patch doesn't add any new tests since there is enough test
coverage already. Passed core/exhaustive runs with HDFS/S3.

- We noticed up to ~113x speed up on S3 tables(thread_pool_size=160)
and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure
HDFS clusters.

- Synthesized the following two large tables on HDFS and S3 and noticed
significant reduction in my test DDL queries.

  (1) 100K partitions + 1 million files
  (2) 80 partitions + 250K files

 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4%
 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25%
 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57%
 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87%
 80-PARTITIONS-250K-FILES-09-INVALIDATE   I -24.88%
 80-PARTITIONS-250K-FILES-03-RECOVER  I -35.90%
 80-PARTITIONS-250K-FILES-07-REFRESH  I -43.03%
 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS  I -43.93%
 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV   I -46.59%
 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION  I -48.71%
 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH   I -49.02%
 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV  I -49.05%
 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87%
 80-PARTITIONS-250K-FILES-S3-03-RECOVER   I -67.17%
 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV   I -76.45%
 80-PARTITIONS-250K-FILES-S3-07-REFRESH   I -87.04%
 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57%

Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/ListMap.java
9 files changed, 411 insertions(+), 238 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/6
--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewe

[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-16 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 5:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG@59
PS4, Line 59:
> I see. So are these literally as simple as, to pick the first one, a single
That is correct.


http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc@35
PS5, Line 35: DEFINE_int32(num_metadata_loading_threads, 16,
: "(Advanced) The number of metadata loading threads (degree of 
parallelism) to use "
: "when loading catalog metadata.");
> I'm confused by the commit message which talks about not loading from hms u
Right, this flag controls the number parallel *table* loads. And each table 
loading involves

- Loading HMS metadata (including schema details, partition information etc.)
- Loading/synthesizing block metadata

So this flag is about inter-table parallelism and the commit message talks 
about intra-table parallel loading. Does that make sense? (I clarified the 
content in the commit message a little to reflect this, incase that helps).


http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc@42
PS5, Line 42: DEFINE_int32(max_s3_parts_parallel_load, 10,
> I would be more aggressive with this parameter and put it at 20.
Done


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@215
PS5, Line 215:   // Block metadata loading stats for a single HDFS path.
> nit: File/block (since we're also loading/refreshing files). Also, you may
Done


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@217
PS5, Line 217: loadedFiles_
> You may want to add a comment here. What is loaded vs refreshed? Is the one
Done


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@218
PS5, Line 218: _
> I believe the convention is that we don't use '_' for public members.
Done. Was not aware of this.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@217
PS5, Line 217: public int loadedFiles_ = 0;
 : public int refreshedFiles_ = 0;
 : public int ignoredFiles_ = 0;
> add comments for these-- see the question regarding refreshedFiles below, f
Yep, the current variables are confusing for sure, redid them and added 
comments.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@231
PS5, Line 231: Runnable
> I don't know how this is used later on, but alternatively you can make Path
I settled for Runnable because it looked more cleaner to me with this class 
exposing the debugString() method and only those callers who need the return 
value can access that method. I'm fine either way. I switched to Callable.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@247
PS5, Line 247: Blocks on the loadBlockMetadata() call.
> Not following this comment. run() either calls refreshBlockMetadata() or lo
The comment was stale. Updated it.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@333
PS5, Line 333: loadBlockMetadata
> I know I am guilty for some of these names but maybe rename this to "resetA
Done


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@363
PS5, Line 363: numUnknownDiskIds
> Are you overriding or incrementing the value of numUnknownDiskIds in the cr
Incrementing and will use the final value in L369. Tracking the create call 
which eventually calls HdfsPartition.createDiskIds(), it only increments and 
doesn't overwrite.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368
PS5, Line 368: for (HdfsPartition partition: partitions) 
partition.setFileDescriptors(
> Am I misreading this or does each partition get set to the same list of new
That is right. All the HdfsPartitions in 'partitions' map to the same partDir. 
So we update the fileDescs for each of them.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368
PS5, Line 368: newFileDescs
> Hm, that doesn't seem particularly safe (i.e. using the same list for every
Like we discussed, there is a subtle bug here (even without the patch). We can 
see the inconsistency with the following steps

- create table test(a in

[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-10 Thread Bharath Vissapragada (Code Review)
Hello Jim Apple,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8235

to look at the new patch set (#5).

Change subject: IMPALA-5429: Multi threaded block metadata loading
..

IMPALA-5429: Multi threaded block metadata loading

Implements multi threaded block metadata loading on the Catalog
server where we fetch block metadata for multiple partitions
in parallel. Number of threads to load the metadata is controlled
by the following two parameters (set on the Catalog server startup)

-max_hdfs_parts_parallel_load(default=5)
-max_s3_parts_parallel_load(default=10)

We use different thread pool sizes for HDFS and S3 based tables
since S3 supports much higher throughput of RPC calls for listStatus
/listFiles. Based on our experiments, S3 showed a linear speed up
(up to ~113x) with increasing number of loading threads where as the
HDFS throughput was limited to ~5x in un-secure clusters and up to
~3.7x in secure clusters. We narrowed it down to scalability
bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the
server and the client side.

One thing to note here is that the thread pool based metadata fetching
is implemented only for loading HDFS block metadata and not for loading
HMS partition information. Our experiments showed that while loading large
partitioned tables, ~90% of the time is spent in connecting to NN and
loading the HDFS block information and optimizing the rest ~10% makes
the code unnecessarily complex without much gain.

Additional notes:

- The multithreading approach is implemented for
  * INVALIDATE (loading from scratch),
  * REFRESH (reusing existing md) code paths,
  * ALTER TABLE ADD/RECOVER PARTITIONS.

- This patch makes the implementation of ListMap thread-safe since
we use that datastructure as a shared state between multiple partition
metadata loding threads.

- While the configuration param max_s3_parts_parallel_load says s3, it
applies for any FileSystem implementation that doesn't support storage
IDs (like ADLS).

Testing and Results:

- This patch doesn't add any new tests since there is enough test
coverage already. Passed core/exhaustive runs with HDFS/S3.

- We noticed up to ~113x speed up on S3 tables(thread_pool_size=160)
and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure
HDFS clusters.

- Synthesized the following two large tables on HDFS and S3 and noticed
significant reduction in my test DDL queries.

  (1) 100K partitions + 1 million files
  (2) 80 partitions + 250K files

 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4%
 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25%
 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57%
 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87%
 80-PARTITIONS-250K-FILES-09-INVALIDATE   I -24.88%
 80-PARTITIONS-250K-FILES-03-RECOVER  I -35.90%
 80-PARTITIONS-250K-FILES-07-REFRESH  I -43.03%
 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS  I -43.93%
 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV   I -46.59%
 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION  I -48.71%
 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH   I -49.02%
 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV  I -49.05%
 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87%
 80-PARTITIONS-250K-FILES-S3-03-RECOVER   I -67.17%
 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV   I -76.45%
 80-PARTITIONS-250K-FILES-S3-07-REFRESH   I -87.04%
 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57%

Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/ListMap.java
9 files changed, 386 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/5
--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-10 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8235/4//COMMIT_MSG@59
PS4, Line 59:  100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4%
> What are these benchmarks?
Thanks for pointing this out. I should've added more context here. Mostafa 
worked on creating a metadata benchmark by synthesizing large partitioned 
tables and then running some operations that stress the Catalog. Looks like the 
code/scripts are not yet ready for consumption by everyone. I borrowed those 
from him to run these tests. I'll add that information to the commit message.



--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 4
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Wed, 11 Oct 2017 00:24:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-09 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8212 )

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..


Patch Set 4: Code-Review+2

Running into flaky tests, giving it another shot. Carrying +2.


--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 4
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Comment-Date: Mon, 09 Oct 2017 22:39:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-09 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8235


Change subject: IMPALA-5429: Multi threaded block metadata loading
..

IMPALA-5429: Multi threaded block metadata loading

Implements multi threaded block metadata loading on the Catalog
server where we fetch block metadata for multiple partitions
in parallel. Number of threads to load the metadata is controlled
by the following two parameters (set on the Catalog server startup)

-max_hdfs_parts_parallel_load(default=5)
-max_s3_parts_parallel_load(default=10)

We use different thread pool sizes for HDFS and S3 based tables
since S3 supports much higher throughput of RPC calls for listStatus
/listFiles. Based on our experiments, S3 showed a linear speed up
(up to ~113x) with increasing number of loading threads where as the
HDFS throughput was limited to ~5x in un-secure clusters and up to
~3.7x in secure clusters. We narrowed it down to scalability
bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the
server and the client side.

One thing to note here is that the thread pool based metadata fetching
is implemented only for loading HDFS block metadata and not for loading
HMS partition information. Our experiments showed that while loading large
partitioned tables, ~90% of the time is spent in connecting to NN and
loading the HDFS block information and optimizing the rest ~10% makes
the code unnecessarily complex without much gain.

Additional notes:

- The multithreading approach is implemented for
  * INVALIDATE (loading from scratch),
  * REFRESH (reusing existing md) code paths,
  * ALTER TABLE ADD/RECOVER PARTITIONS.

- This patch makes the implementation of ListMap thread-safe since
we use that datastructure as a shared state between multiple partition
metadata loding threads.

- While the configuration param max_s3_parts_parallel_load says s3, it
applies for any FileSystem implementation that doesn't support storage
IDs (like ADLS).

Testing and Results:

- This patch doesn't add any new tests since there is enough test
coverage already. Passed core/exhaustive runs with HDFS/S3.

- We noticed up to ~113x speed up on S3 tables(thread_pool_size=160)
and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure
HDFS clusters.

- Benchmark improvements on a 16 node cluster (I = Improvement)

 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4%
 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25%
 100K-PARTITIONS-1M-FILES-CUSTOM-11-DROP-PARTITIONI -18.53%
 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57%
 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87%
 80-PARTITIONS-250K-FILES-09-INVALIDATE   I -24.88%
 80-PARTITIONS-250K-FILES-01-DROP I -34.82%
 80-PARTITIONS-250K-FILES-03-RECOVER  I -35.90%
 80-PARTITIONS-250K-FILES-07-REFRESH  I -43.03%
 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS  I -43.93%
 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV   I -46.59%
 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION  I -48.71%
 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH   I -49.02%
 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV  I -49.05%
 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87%
 80-PARTITIONS-250K-FILES-S3-03-RECOVER   I -67.17%
 80-PARTITIONS-250K-FILES-S3-01-DROP  I -70.38%
 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV   I -76.45%
 80-PARTITIONS-250K-FILES-S3-07-REFRESH   I -87.04%
 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57%

Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/ListMap.java
9 files changed, 386 insertions(+), 212 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8235/4
--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 4
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

2017-10-06 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244
PS2, Line 1244: 2 GB, a serious error can occur. If only a limited 
number of partitions are actively being
> Done. "Serious error" was my compromise I always used for MySQL, where the
Sorry I just realized the ask here. Yea we do need some numbers like 
#partitions #files and #blocks to come up with an estimate of a particular 
table's contribution to the memory footprint.  Roughly something of the order of

#partitions * 2kB + #files * 750B + # file_blocks * 300B

The problem is that there is no easy way a user can get these numbers for a 
given table.

Similarly for incremental stats we can use the HMS dump to get an estimate (or 
directly connect to the HMS db)

select sum(length(PARTITION_PARAMS.PARAM_KEY) + 
length(PARTITION_PARAMS.PARAM_VALUE)), PARTITIONS.TBL_ID from PARTITIONS, 
PARTITION_PARAMS where PARTITIONS.PART_ID = PARTITION_PARAMS.PART_ID and 
PARTITION_PARAMS.PARAM_KEY LIKE 'impala_intermediate%' group by 
PARTITIONS.TBL_ID;

But in most serious setups, end users can't do this. IMPALA-4870 aims to expose 
most of this data (WIP). May be we can revisit this doc once it is done.



--
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Silvius Rus 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:46:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Revert "IMPALA-6012: workaround - downgrade hive temporarily"

2017-10-05 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8217 )

Change subject: Revert "IMPALA-6012: workaround - downgrade hive temporarily"
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8217
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I01a0de583de4ec55a87ec54bca480dcb3904af1f
Gerrit-Change-Number: 8217
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 05 Oct 2017 16:16:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-05 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8212 )

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..


Patch Set 3:

Failed due to IMPALA-5999. Retrying.


--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Comment-Date: Thu, 05 Oct 2017 16:14:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8212 )

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..


Patch Set 3: Code-Review+2

(1 comment)

Carrying +2.

http://gerrit.cloudera.org:8080/#/c/8212/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/8212/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@293
PS2, Line 293:   LOG.info("Metadata load request already in progress for 
table: " +
> Remove "Another"
Done



--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Thu, 05 Oct 2017 05:57:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8212

to look at the new patch set (#3).

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..

IMPALA-6016: Fix logging in TableLoadingMgr class

This patch moves the logging of "loads in progress"
to a place where the current load is accounted. The
reason to move the logging is that the current load
is not reflected in the loadingTables_ till loadAsync()
is called.

Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
---
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
1 file changed, 6 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/8212/3
--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8212

to look at the new patch set (#2).

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..

IMPALA-6016: Fix logging in TableLoadingMgr class

This patch moves the logging of "loads in progress"
to a place where the current load is accounted. The
reason to move the logging is that the current load
is not reflected in the loadingTables_ till loadAsync()
is called.

Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
---
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
1 file changed, 6 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/8212/2
--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8212 )

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@72
PS1, Line 72: LOG.info(String.format("Remaining items in queue: %s. 
Loads in progress: %s",
> I think the "Remaining" part is now wrong because the size includes the tab
Not totally sure I understand this. Do you mean the current table (represented 
by tableName_) would be included in tableLoadingDeque_.size() ? If thats what 
you meant, doesn't L288 remove it off the queue? Please correct me if you meant 
something else here.


http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@293
PS1, Line 293:   return;
> Can you also add a log message here indicating that the table is already be
Done



--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:18:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8212


Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..

IMPALA-6016: Fix logging in TableLoadingMgr class

This patch moves the logging of "loads in progress"
to a place where the current load is accounted. The
reason to move the logging is that the current load
is not reflected in the loadingTables_ till loadAsync()
is called.

Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
---
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
1 file changed, 4 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/8212/1
--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8205 )

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8205/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8205/1//COMMIT_MSG@11
PS1, Line 11:
May be add that this should be reverted once HIVE-17189 makes it to the 
dependencies?



--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 16:57:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6009: Upgrade Guava to 14.0.1

2017-10-03 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8198 )

Change subject: IMPALA-6009: Upgrade Guava to 14.0.1
..


Patch Set 4: Code-Review+1

This change looks ok to me. But given its implications across multiple 
dependent projects, I'm not confident enough to +2 it. May be someone else can 
do that.


--
To view, visit http://gerrit.cloudera.org:8080/8198
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddc5da8849d5aa7317d3dc572884d05dee859bdd
Gerrit-Change-Number: 8198
Gerrit-PatchSet: 4
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Tue, 03 Oct 2017 20:00:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6009: Upgrade Guava to 14.0.1

2017-10-03 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8198 )

Change subject: IMPALA-6009: Upgrade Guava to 14.0.1
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8198/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8198/3//COMMIT_MSG@16
PS3, Line 16: temporary workaround
Why as a temporary workaround? IMO Guava upgrade is a good thing unless we are 
relying on something in 11.0.2 or if there is a better longterm solution. Also, 
I see that 14.0.1 is a pretty old version too [1]

[1] https://github.com/google/guava/wiki/Release14



--
To view, visit http://gerrit.cloudera.org:8080/8198
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddc5da8849d5aa7317d3dc572884d05dee859bdd
Gerrit-Change-Number: 8198
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Tue, 03 Oct 2017 17:29:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5990: Part 1: JNI-based LZ4 de/compression

2017-10-02 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8150 )

Change subject: IMPALA-5990: Part 1: JNI-based LZ4 de/compression
..


Patch Set 5: Code-Review+1

(4 comments)

The patch looks good to me once the comments are fixed.

http://gerrit.cloudera.org:8080/#/c/8150/5/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/8150/5/be/src/service/fe-support.cc@457
PS5, Line 457: // It does not matter if 'src' is copied because we only read 
from it.
nit: May be document src_off, dst_off and return value for readability.


http://gerrit.cloudera.org:8080/#/c/8150/5/be/src/service/fe-support.cc@481
PS5, Line 481: env->SetByteArrayRegion(dst, 0, len, 
reinterpret_cast(dst_data));
Check the return value of this? Given it made a copy, its likely that this call 
can fail too?


http://gerrit.cloudera.org:8080/#/c/8150/5/be/src/util/jni-util.h
File be/src/util/jni-util.h:

http://gerrit.cloudera.org:8080/#/c/8150/5/be/src/util/jni-util.h@184
PS5, Line 184: !is_copy_ &&
My understanding is that we should call ReleasePrimitiveArrayCritical() 
irrespective of whether a copy is made. My reasoning is that it seems to be 
unblocking the GC thread, which I think is required.


http://gerrit.cloudera.org:8080/#/c/8150/5/be/src/util/jni-util.h@192
PS5, Line 192: ;
WARN_UNUSED_RESULT



--
To view, visit http://gerrit.cloudera.org:8080/8150
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I237802944875b07080db0159ff8ec548150fd95e
Gerrit-Change-Number: 8150
Gerrit-PatchSet: 5
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 02 Oct 2017 23:19:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5990: Part 1: JNI-based LZ4 de/compression

2017-09-29 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8150 )

Change subject: IMPALA-5990: Part 1: JNI-based LZ4 de/compression
..


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/8150/2/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/8150/2/be/src/service/fe-support.cc@454
PS2, Line 454:
> My understanding is that the copied bytes would also be freed via ReleasePr
I'm not fully familiar with how it works internally, but per my reading of the 
JVM code, all this method does is to unlock the GC thread[1]. It looks to me 
like it doesn't free the copied bytes (I could be wrong).

(If you think this case is possible, I think we should have code to perform 
cleanup of copies. Please ignore if you think that is not possible.)

[1] 
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/prims/jni.cpp#l4275


http://gerrit.cloudera.org:8080/#/c/8150/3/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/8150/3/be/src/service/fe-support.cc@452
PS3, Line 452: the
remove


http://gerrit.cloudera.org:8080/#/c/8150/3/be/src/service/fe-support.cc@458
PS3, Line 458: Note that it does not matter if 'src' was copied
Why don't we treat it as an error? This could mean an OOM on the JVM?


http://gerrit.cloudera.org:8080/#/c/8150/3/be/src/service/fe-support.cc@471
PS3, Line 471: "
May be better to add that the JVM could've run out of memory (supportability).


http://gerrit.cloudera.org:8080/#/c/8150/2/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/8150/2/fe/src/main/java/org/apache/impala/service/FeSupport.java@289
PS2, Line 289:  - Storing the uncompressed data size allows decompression to 
pre-allocate its output
 :* - Decompression requires the exact compressed data size, so 
storing it allows us to
 :*   avoid compacting the returned byte[]
 :*/
> I don't think that's right because NativeLz4CompressBound() returns 0 if th
Oh, I thought this was an overflow check. Just realized LZ4 has an inbuilt 
limit of 2113929216 bytes(~2GB)


http://gerrit.cloudera.org:8080/#/c/8150/2/fe/src/main/java/org/apache/impala/service/FeSupport.java@306
PS2, Line 306: if (compressedSize < 0) {
 :   throw new InternalException(
> Not sure what you mean. A Java byte[] cannot be shrunk without copying it i
Ok, sorry for not being clear. My point is, why have two versions, 
Lz4CompressToByteBuffer() and Lz4Compress(). Is that a requirement for the 
follow on patch?


http://gerrit.cloudera.org:8080/#/c/8150/3/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/8150/3/fe/src/main/java/org/apache/impala/service/FeSupport.java@306
PS3, Line 306: if (compressedSize < 0) {
Looks like this check should be <=.

Per the documentation of LZ4_compress(),

return : the number of bytes written in buffer dest
 or 0 if the compression fails


http://gerrit.cloudera.org:8080/#/c/8150/3/fe/src/test/java/org/apache/impala/service/Lz4CompressionTest.java
File fe/src/test/java/org/apache/impala/service/Lz4CompressionTest.java:

http://gerrit.cloudera.org:8080/#/c/8150/3/fe/src/test/java/org/apache/impala/service/Lz4CompressionTest.java@66
PS3, Line 66: } catch (Throwable e) {
I think fail() throws an "Error" (AssertionError) which this Throwable can 
catch. May be make it Exception?


http://gerrit.cloudera.org:8080/#/c/8150/3/fe/src/test/java/org/apache/impala/service/Lz4CompressionTest.java@72
PS3, Line 72: e.printStackTrace();
unreachable?


http://gerrit.cloudera.org:8080/#/c/8150/3/fe/src/test/java/org/apache/impala/service/Lz4CompressionTest.java@78
PS3, Line 78:   private byte[] seqPayload(int numBytes) {
Add one liner comments on these helpers.



--
To view, visit http://gerrit.cloudera.org:8080/8150
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I237802944875b07080db0159ff8ec548150fd95e
Gerrit-Change-Number: 8150
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 29 Sep 2017 22:16:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5990: Part 1: JNI-based LZ4 de/compression

2017-09-28 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8150 )

Change subject: IMPALA-5990: Part 1: JNI-based LZ4 de/compression
..


Patch Set 2:

(8 comments)

Didn't see the test class, but got some comments on the logic.

http://gerrit.cloudera.org:8080/#/c/8150/2/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/8150/2/be/src/service/fe-support.cc@450
PS2, Line 450: // whether to compress or decompress the 'src' into 'dst'.
May be worth commenting about GetPrimitiveArrayCritical() and lifecycle of the 
buffer. Looks like it is up to the JVM to GC it eventually.


http://gerrit.cloudera.org:8080/#/c/8150/2/be/src/service/fe-support.cc@451
PS2, Line 451: extern "C"
Is extern required for this method? Also, no return type?


http://gerrit.cloudera.org:8080/#/c/8150/2/be/src/service/fe-support.cc@453
PS2, Line 453: jbyteArray dst, jint dst_off, jint dst_len, jboolean 
compress) {
DCHECK_GE(dst_len, LZ4_compressBound(src_size))?


http://gerrit.cloudera.org:8080/#/c/8150/2/be/src/service/fe-support.cc@454
PS2, Line 454: 0
nit: Not sure if this was intentional, but you are passing a nullptr argument. 
JDK can handle it ok [1], but looks like you need to pass a pointer to 0/1.

Also, we should probably make sure the (isCopy==false) (after the call), incase 
the JVM does a copy for some reason, we should free the copied native bytes.

[1] 
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/prims/jni.cpp#l4244


http://gerrit.cloudera.org:8080/#/c/8150/2/be/src/service/fe-support.cc@455
PS2, Line 455:   if (src_data == nullptr) return -1;
Do we still need to call ReleasePrimitiveArrayCritical() in this case? Also, I 
see that you implemented a ScopedArrayCritical class to do that (much cleaner). 
Wondering why you've undone the changes? (same below)


http://gerrit.cloudera.org:8080/#/c/8150/2/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/8150/2/fe/src/main/java/org/apache/impala/service/FeSupport.java@285
PS2, Line 285:*/
Probably worth commenting about the format of the output buffer.


http://gerrit.cloudera.org:8080/#/c/8150/2/fe/src/main/java/org/apache/impala/service/FeSupport.java@289
PS2, Line 289: if (dstLen <= 0) {
 :   throw new InternalException(
 :   "Payload is too big for LZ4 compression: " + 
payload.length);
 : }
Move this below L294?


http://gerrit.cloudera.org:8080/#/c/8150/2/fe/src/main/java/org/apache/impala/service/FeSupport.java@306
PS2, Line 306: ByteBuffer.wrap(dst).putInt(0, payload.length);
 : ByteBuffer.wrap(dst).putInt(4, compressedSize);
Can't we return the resized buffer directly here? Basically merge the logic of 
Lz4CompressToByteBuffer into this method?



--
To view, visit http://gerrit.cloudera.org:8080/8150
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I237802944875b07080db0159ff8ec548150fd95e
Gerrit-Change-Number: 8150
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 28 Sep 2017 20:51:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Change log for 2.10.0 release

2017-09-22 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8079 )

Change subject: Change log for 2.10.0 release
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8079/2/impala-docs.html
File impala-docs.html:

http://gerrit.cloudera.org:8080/#/c/8079/2/impala-docs.html@a148
PS2, Line 148:
> John had requested we maintain the "latest" in HTML and the archived in PDF
Just curious, any particular reason? (Apart from the Google search presenting 
older versions)



--
To view, visit http://gerrit.cloudera.org:8080/8079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d47cf805205b25861e38d106412bb7f892016a0
Gerrit-Change-Number: 8079
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Fri, 22 Sep 2017 17:45:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) revive link to HTML docs for 2.9

2017-09-22 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8126 )

Change subject: revive link to HTML docs for 2.9
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8126/1/impala-docs.html
File impala-docs.html:

http://gerrit.cloudera.org:8080/#/c/8126/1/impala-docs.html@139
PS1, Line 139: documetnation
Typo



--
To view, visit http://gerrit.cloudera.org:8080/8126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3cfedbea79496d7f89a49cb0143741b44eb4f55
Gerrit-Change-Number: 8126
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Comment-Date: Fri, 22 Sep 2017 17:31:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) revive link to HTML docs for 2.9

2017-09-22 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8126 )

Change subject: revive link to HTML docs for 2.9
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3cfedbea79496d7f89a49cb0143741b44eb4f55
Gerrit-Change-Number: 8126
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Comment-Date: Fri, 22 Sep 2017 17:31:11 +
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Change log for 2.10.0 release

2017-09-22 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8079 )

Change subject: Change log for 2.10.0 release
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8079/2/impala-docs.html
File impala-docs.html:

http://gerrit.cloudera.org:8080/#/c/8079/2/impala-docs.html@a148
PS2, Line 148:
> Hey Bharath, just a note, it's OK to keep this around until the next releas
Okay. Should we version the contents of that html directory and retain older 
docs too? (Copying stuff to docs/build/html//* and always symlink the 
latest to docs/build/html/latest/*)



--
To view, visit http://gerrit.cloudera.org:8080/8079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d47cf805205b25861e38d106412bb7f892016a0
Gerrit-Change-Number: 8079
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Fri, 22 Sep 2017 17:28:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-22 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h@131
PS6, Line 131: Also determines any
 :   /// deletions of catalog objects by looking at the
 :   /// difference of the last set of topic entry keys that were 
sent and the current set
 :   /// of topic entry keys
Remove?


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@369
PS5, Line 369:   catalogTbl.setTable(tbl.toThrift());
> It's racy if you do that. A concurrent table modification may have gotten a
Yea, got it, thanks.


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@970
PS5, Line 970: return removedTable;
> I thought about it but there are two reasons not to do it: a) we need the T
I see.


http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@310
PS6, Line 310: Tables, Views, Databases, and
 :* Functions)
datasources, cachepools, sentry stuff ... I think the list is too big, may be 
remove this, rather than keeping it stale?



--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 6
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 22 Sep 2017 16:20:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5954: Set DO NOT UPDATE STATS in alterTable() RPCs to HMS.

2017-09-20 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5954: Set DO_NOT_UPDATE_STATS in alterTable() RPCs to 
HMS.
..


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8112/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 800:   // HMS requires this param for stats changes to take effect.
> I checked the Hive/HMS code carefully and partition-level stat are not affe
Thanks for explaining. I agree.


PS1, Line 822: }
 : // HMS requires this param for stats changes to take effect.
> Setting this flag is required for for stats changes to take effect (table o
Sorry, I got confused. I now see how all this works.


-- 
To view, visit http://gerrit.cloudera.org:8080/8112
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05c63315579d66efeaa4c65105372ae68820ab2f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5954: Set DO NOT UPDATE STATS in alterTable() RPCs to HMS.

2017-09-20 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5954: Set DO_NOT_UPDATE_STATS in alterTable() RPCs to 
HMS.
..


Patch Set 1: -Code-Review

(2 comments)

Sorry, got a couple more questions after going through the code again. Will +1 
again once I understand those scenarios.

http://gerrit.cloudera.org:8080/#/c/8112/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 800:   
partition.putToParameters(MetastoreShim.statsGeneratedViaStatsTaskParam());
Are partition level stats also affected, just wanted to make sure that we don't 
need to set it here as well.


PS1, Line 822: Pair statsTaskParam = 
MetastoreShim.statsGeneratedViaStatsTaskParam();
 : msTbl.putToParameters(statsTaskParam.first, 
statsTaskParam.second);
This looks redundant and is overwritten anyway right?


-- 
To view, visit http://gerrit.cloudera.org:8080/8112
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05c63315579d66efeaa4c65105372ae68820ab2f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.

2017-09-20 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5954: Set DO NOT UPDATE STATS in alterTable() RPCs to HMS.

2017-09-20 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5954: Set DO_NOT_UPDATE_STATS in alterTable() RPCs to 
HMS.
..


Patch Set 1: Code-Review+1

Should we add more alter scenarios to "alter-table.test"  to make sure 
alterTable() doesn't update stats?

-- 
To view, visit http://gerrit.cloudera.org:8080/8112
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05c63315579d66efeaa4c65105372ae68820ab2f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5920: addendum - add missing RAT check

2017-09-19 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5920: addendum - add missing RAT check
..


Patch Set 1: Code-Review+2

Looks simple enough to me. Hence +2'ing.

-- 
To view, visit http://gerrit.cloudera.org:8080/8108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44654004bef74b741cfdf4fb07c274a77320b818
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing

2017-09-19 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3516: Avoid writing to /tmp in testing
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8047/3/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 763: FileWriter fw = new FileWriter(outDir_ + testFile + ".test");
> There's a bug in this patch. A "/" is missing here between outDir_ and test
I think its better to switch outDir_ to a "Path" type to avoid these kind of 
flaky stuff and use Path.toFile() wherever required.


-- 
To view, visit http://gerrit.cloudera.org:8080/8047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Increment version to 2.11.0-SNAPSHOT

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8080

Change subject: Increment version to 2.11.0-SNAPSHOT
..

Increment version to 2.11.0-SNAPSHOT

Change-Id: I2a60fbc5f2c1a9ba9697e6f1114bdf18997aa92c
---
M bin/save-version.sh
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/8080/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2a60fbc5f2c1a9ba9697e6f1114bdf18997aa92c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR](asf-site) Change log for 2.10.0 release

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: Change log for 2.10.0 release
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d47cf805205b25861e38d106412bb7f892016a0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Change log for 2.10.0 release

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has submitted this change and it was merged.

Change subject: Change log for 2.10.0 release
..


Change log for 2.10.0 release

Change-Id: I5d47cf805205b25861e38d106412bb7f892016a0
Reviewed-on: http://gerrit.cloudera.org:8080/8079
Reviewed-by: Jim Apple 
Tested-by: Bharath Vissapragada 
---
A docs/changelog-2.10.html
M impala-docs.html
2 files changed, 590 insertions(+), 6 deletions(-)

Approvals:
  Bharath Vissapragada: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/8079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5d47cf805205b25861e38d106412bb7f892016a0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR](asf-site) Change log for 2.10.0 release

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8079

Change subject: Change log for 2.10.0 release
..

Change log for 2.10.0 release

Change-Id: I5d47cf805205b25861e38d106412bb7f892016a0
---
A docs/changelog-2.10.html
M impala-docs.html
2 files changed, 590 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/8079/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5d47cf805205b25861e38d106412bb7f892016a0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8070/1//COMMIT_MSG
Commit Message:

Line 9: The query 'SET =""' will now unset option, reverting it to
> The new behaviour is more consistent with us reporting the default value of
Fair point. Thanks for explaining.


-- 
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

PS5, Line 150: added
nit: May be 'added/updated/removed' to be consistent with other places?


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS5, Line 1366: ;
Could you add 'len' too, given this is trace logging anyway. (might be helpful 
in determining each key's contribution).


PS5, Line 1399: // Nothing to do here.
Remove? Kinda seems obvious


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

PS5, Line 53: identiy
nit: typo


PS5, Line 123: if (first.getType() != second.getType()) return false;
This seems to already be handled by the next line (by generating a different 
name).


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

PS5, Line 318: TGetAllCatalogObjectsResponse resp = new 
TGetAllCatalogObjectsResponse();
 : resp.setUpdated_objects(new ArrayList());
 : resp.setDeleted_objects(new ArrayList());
 : resp.setMax_catalog_version(Catalog.INITIAL_CATALOG_VERSION);
Move all of this into getCatalogChanges() and make it return resp?


Line 350:* to 'resp'.
May be worth adding that this method expects that the global read lock is held 
for consistency.


Line 369:   if (tbl.getCatalogVersion() > fromVersion) {
Given we hold the global readLock() already (which means that the version 
cannot change while we are inside this function), can we move this above L366 ? 
This helps avoiding wait on un-necessary tables? Please correct me if I'm wrong 
in my assumption?


PS5, Line 374: trace
This seems like a serious enough error to me, basically blocking updates on a 
table to the impalads. Higher severity level may be?


Line 970: }
I see you made changes in the CatalogOpEx.drop*() to add the corresponding 
objects to the deltaLog_. Isn't it easier if we do it inside remove*() 
functions? Basically its up the Catalog  to maintain its own state (detlaLog_) 
rather than clients updating it? That way deltaLog_ can remain private too? 
Else anyone can getDeleteLog() and mess with it.


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

Line 322:*  Note that drop operations that come from statestore heartbeats 
always have a
Update?


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: Update download and signature links for 2.10.0 release.
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3072eda6f34a5e9245ddec70f725a65a13a14b78
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has submitted this change and it was merged.

Change subject: Update download and signature links for 2.10.0 release.
..


Update download and signature links for 2.10.0 release.

Change-Id: I3072eda6f34a5e9245ddec70f725a65a13a14b78
Reviewed-on: http://gerrit.cloudera.org:8080/8073
Reviewed-by: Jim Apple 
Tested-by: Bharath Vissapragada 
---
M downloads.html
1 file changed, 16 insertions(+), 6 deletions(-)

Approvals:
  Bharath Vissapragada: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/8073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3072eda6f34a5e9245ddec70f725a65a13a14b78
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8073

Change subject: Update download and signature links for 2.10.0 release.
..

Update download and signature links for 2.10.0 release.

Change-Id: I3072eda6f34a5e9245ddec70f725a65a13a14b78
---
M downloads.html
1 file changed, 16 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8073/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3072eda6f34a5e9245ddec70f725a65a13a14b78
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-09-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8070/1//COMMIT_MSG
Commit Message:

Line 9: The query 'SET =""' will now unset option, reverting it to
A quick question, did you consider other alternatives like "UNSET " or 
"RESET " before settling for this?  

Reason being, users might confuse this with NONE. For example,

SET COMPRESSION_CODEC="" -> COMPRESSION_CODEC=SNAPPY (default)
SET COMPRESSION_CODE=NONE-> COMPRESSION_CODEC=NONE


-- 
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Revert "Update download and signature links for 2.10.0 release."

2017-09-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: Revert "Update download and signature links for 2.10.0 release."
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I592b22771a9b345448887c129eacfe4d88d7e59d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Revert "Update download and signature links for 2.10.0 release."

2017-09-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has submitted this change and it was merged.

Change subject: Revert "Update download and signature links for 2.10.0 release."
..


Revert "Update download and signature links for 2.10.0 release."

This reverts commit ff9d269a20ed1e452fe3a2f5450e71fcdcc4f30e.

Change-Id: I592b22771a9b345448887c129eacfe4d88d7e59d
Reviewed-on: http://gerrit.cloudera.org:8080/8057
Reviewed-by: Jim Apple 
Tested-by: Bharath Vissapragada 
---
M downloads.html
1 file changed, 6 insertions(+), 16 deletions(-)

Approvals:
  Bharath Vissapragada: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/8057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I592b22771a9b345448887c129eacfe4d88d7e59d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR](asf-site) Revert "Update download and signature links for 2.10.0 release."

2017-09-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: Revert "Update download and signature links for 2.10.0 release."
..


Patch Set 1:

ASF policy requires that we need to wait for at least 24 hours before updating 
the release artefacts' links so that all the mirrors can catch up. Reverting 
this website change till then.

-- 
To view, visit http://gerrit.cloudera.org:8080/8057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I592b22771a9b345448887c129eacfe4d88d7e59d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Revert "Update download and signature links for 2.10.0 release."

2017-09-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8057

Change subject: Revert "Update download and signature links for 2.10.0 release."
..

Revert "Update download and signature links for 2.10.0 release."

This reverts commit ff9d269a20ed1e452fe3a2f5450e71fcdcc4f30e.

Change-Id: I592b22771a9b345448887c129eacfe4d88d7e59d
---
M downloads.html
1 file changed, 6 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/8057/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I592b22771a9b345448887c129eacfe4d88d7e59d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: Update download and signature links for 2.10.0 release.
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has submitted this change and it was merged.

Change subject: Update download and signature links for 2.10.0 release.
..


Update download and signature links for 2.10.0 release.

Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
Reviewed-on: http://gerrit.cloudera.org:8080/8052
Reviewed-by: Jim Apple 
Tested-by: Bharath Vissapragada 
---
M downloads.html
1 file changed, 16 insertions(+), 6 deletions(-)

Approvals:
  Bharath Vissapragada: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/8052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: Update download and signature links for 2.10.0 release.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8052/2/downloads.html
File downloads.html:

PS2, Line 163: 1
> 512
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-13 Thread Bharath Vissapragada (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/8052

to look at the new patch set (#3).

Change subject: Update download and signature links for 2.10.0 release.
..

Update download and signature links for 2.10.0 release.

Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
---
M downloads.html
1 file changed, 16 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/8052/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: Update download and signature links for 2.10.0 release.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8052/1/downloads.html
File downloads.html:

Line 162: 
"https://www.apache.org/dist/incubator/impala/2.10.0/apache-impala-incubating-2.10.0.tar.gz.sha";>
> 404. https://www.apache.org/dist/incubator/impala/2.10.0/apache-impala-incu
Ouch, missed that. Thanks.


-- 
To view, visit http://gerrit.cloudera.org:8080/8052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-13 Thread Bharath Vissapragada (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/8052

to look at the new patch set (#2).

Change subject: Update download and signature links for 2.10.0 release.
..

Update download and signature links for 2.10.0 release.

Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
---
M downloads.html
1 file changed, 15 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/8052/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR](asf-site) Update download and signature links for 2.10.0 release.

2017-09-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8052

Change subject: Update download and signature links for 2.10.0 release.
..

Update download and signature links for 2.10.0 release.

Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
---
M downloads.html
1 file changed, 15 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/8052/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id29b30a06d4e0f64c08460cc9e58688ea8bf3f8d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has abandoned this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Abandoned

While this change works as expected, I think we can be more uniform in our 
approach so that the system is easy to reason about. With this patch, its 
possible that the Catalog can support large topic size while Impalads can't 
deserialize it (for example, a single table larger than 2GB). While we explore 
other options, I'm abandoning this change.

-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 19: Code-Review+2

Thanks for the reviews. Carrying +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7955

to look at the new patch set (#19).

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..

IMPALA-5881: Use native allocation while building catalog updates

This patch moves the allocation of thrift structures for serializing
the output of GetAllCatalogObjects() call into the native side. This
is done to prevent the Catalog from hitting JVM array limitations
(2GB maximum size) at scale. Additionally this patch also extends the
--compact_catalog_top=true to apply TCompactProtocol for the above
serialization to reduce the in-memory footprint.

This patch also caps the native allocations to not go beyond 4GB due
to Thrift library limitations. (Thrift internally uses uint32_t
datatype to represent the message size which limits the size to
~4.2GB).

Testing: Passed ASAN + HDFS core and DEBUG + HDFS exhaustive.

Deployed the patch on a 16 node cluster and tested it on a
Catalog-update topic of 3.5GB (uncompressed) or 780MB (compressed).

Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
---
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/rpc/jni-thrift-util.h
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
A fe/src/main/java/org/apache/impala/thrift/NativeAllocator.java
A fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
A fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java
A fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
A fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java
12 files changed, 731 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7955/19
-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 18:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7955/18/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS18, Line 117: DCHECK_LE(len, numeric_limits::max());
> This DCHECK needs to be run against nbuffer.buffer_len directly.
I'm not totally sure about this part, but looks like the static_cast() can 
silently fail. Either way I think its good to make this check on buffer_len


http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

PS18, Line 155: Catalog
> nit: catalog
Done


http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
File 
fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java:

Line 61: nbaos.write(b, offset, len);
> try catch and assert if we catch an exception or use the exceptionCaught pa
As discussed, not really needed since we expect the write to throw if an error 
occurs and then it is caught by the test.


Line 67:   // Makes sure that the memory is freed by the end of nboas.write().
> Expects a write() to fail and checks that the memory is freed after the uns
Done


Line 71:   nbaos.write(b, offset, len);
> Add an assert that fails if the write succeeded.
Done


Line 95:   public void nbaosTest() {
> NbaosTest
Done


Line 99: testAllocator  = new NativeTestAllocator();
> move into L96
Done


Line 100: // Check that initial allocation failure in NBAOS c'tor
> remove "that"
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#18).

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..

IMPALA-5881: Use native allocation while building catalog updates

This patch moves the allocation of thrift structures for serializing
the output of GetAllCatalogObjects() call into the native side. This
is done to prevent the Catalog from hitting JVM array limitations
(2GB maximum size) at scale. Additionally this patch also extends the
--compact_catalog_top=true to apply TCompactProtocol for the above
serialization to reduce the in-memory footprint.

This patch also caps the native allocations to not go beyond 4GB due
to Thrift library limitations. (Thrift internally uses uint32_t
datatype to represent the message size which limits the size to
~4.2GB).

Testing: Passed ASAN + HDFS core and DEBUG + HDFS exhaustive.

Deployed the patch on a 16 node cluster and tested it on a
Catalog-update topic of 3.5GB (uncompressed) or 780MB (compressed).

Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
---
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/rpc/jni-thrift-util.h
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
A fe/src/main/java/org/apache/impala/thrift/NativeAllocator.java
A fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
A fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java
A fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
A fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java
12 files changed, 731 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7955/18
-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 17:

(7 comments)

Fixed a subtle bug in the test allocator.

http://gerrit.cloudera.org:8080/#/c/7955/17/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
File 
fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java:

Line 40:   || size >= 
NativeByteArrayOutputStream.BUFFER_MAX_SIZE_LIMIT) {
> Our real allocator would not OOM if the size is big, at least not in this p
Done


Line 60:   private void writeAndCheck(NativeByteArrayOutputStream nboas,
> I'd prefer to split this up into writeOk() and writeNotOk(). Otherwise, tes
Done


Line 78: NativeByteArrayOutputStream nboas =
> The first allocation happens here in the c'tor. We need to test that as wel
Done


Line 90: nboas = new NativeByteArrayOutputStream(testAllocator);
> nit: nbaos
Done


Line 108: 
> remove extraneous newines
Done


Line 114: testAllocator = new NativeTestAllocator();
> Add a helper function for these smaller tests that creates a new allocator,
Done


http://gerrit.cloudera.org:8080/#/c/7955/17/fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java
File fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java:

Line 81:   public void testBasicSerialization()
> We usually call these TestBasicSerialization (first letter is uppercase)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-11 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#17).

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..

IMPALA-5881: Use native allocation while building catalog updates

This patch moves the allocation of thrift structures for serializing
the output of GetAllCatalogObjects() call into the native side. This
is done to prevent the Catalog from hitting JVM array limitations
(2GB maximum size) at scale. Additionally this patch also extends the
--compact_catalog_top=true to apply TCompactProtocol for the above
serialization to reduce the in-memory footprint.

This patch also caps the native allocations to not go beyond 4GB due
to Thrift library limitations. (Thrift internally uses uint32_t
datatype to represent the message size which limits the size to
~4.2GB).

Testing: Passed ASAN + HDFS core and DEBUG + HDFS exhaustive.

Deployed the patch on a 16 node cluster and tested it on a
Catalog-update topic of 3.5GB (uncompressed) or 780MB (compressed).

Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
---
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/rpc/jni-thrift-util.h
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
A fe/src/main/java/org/apache/impala/thrift/NativeAllocator.java
A fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
A fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java
A fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
A fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java
12 files changed, 716 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7955/17
-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-11 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 16:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/7955/16/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

Line 120:   // Now that we deserialize the thrift objects into 
TGetAllCatalogObjectsResponse,
> deserialized (past tense)
Done


http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

Line 84:   // flag.
> remove this line
Done


http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java:

Line 61:   " maximum limit of " + BUFFER_MAX_SIZE_LIMIT + " bytes";
> remove first space in string
Done


Line 99:* - bufferPtr_>=0
> use consistent spacing, i.e. bufferPtr_ >= 0
Done


Line 102: Preconditions.checkState(bufferPtr_ >= 0);
> remove (documenting in comment is better here)
Done


http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
File 
fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java:

Line 28:   // Custom NativeAllocator that accounts the allocated and freed 
bytes and randomly
> the allocated/freed bytes and simulates allocation failures
Done


Line 39:   // Throws an OOM on every alternate call to reallocate().
> on every 2nd call
Done


Line 51:   allocatedBytes_ = 0;
> single line
Done


Line 87: while (testAllocator_.getAllocationFailures() < 10) {
> I still don't think this is a great approach because it does not reflect re
Makes sense. Deterministic tests are always better and the test combinations 
are small too.


Line 91: writeAndCheck(b, 1, 0);
> These should be tested from a fresh Nbaos and not from the one that is alre
yea I added a new nboas for all these set of tests.


http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java
File fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java:

Line 58:   // Deserialize the object at result.buffer_ptr and we confirm it 
is the same
> remove "we"
Done


Line 119: Assert.assertTrue(nativeSerializer.getSerializedBytesSize() >= 
3584 * 1024 * 1024);
> Isn't there an assertGe() or something like that? Also print a message with
Yea, I didn't find it.


Line 137:   e.printStackTrace();
> remove
Done


Line 166:   testBasicSerialization(factory);
> Let's make these separate top-level tests, if's fine to repeat the loop ove
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-08 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 15:

(1 comment)

The test as such in current state is susceptible to OOM, given we are doing 
huge allocations. I'm figuring out a way to workaround that so that it is not 
flaky. Meanwhile pushing out the test for review.

http://gerrit.cloudera.org:8080/#/c/7955/15/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java:

PS15, Line 103: System.out.println(len);
Removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


  1   2   3   4   5   6   >