[jira] [Commented] (KUDU-2722) Ability to mark a partition or table as read only

2019-06-28 Thread xiaokai.wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875306#comment-16875306
 ] 

xiaokai.wang commented on KUDU-2722:


Working on this.

> Ability to mark a partition or table as read only
> -
>
> Key: KUDU-2722
> URL: https://issues.apache.org/jira/browse/KUDU-2722
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Grant Henke
>Priority: Major
>  Labels: roadmap-candidate
>
> It could be useful to prevent data from being mutated in a table or 
> partition. For example this would allow users to lock older range partitions 
> from receiving inserts/updates/deletes ensuring any queries/reports running 
> on that data always show the same results.
> There might also be optimization (resource/storage) opportunities we could 
> make server side once a table is marked as read only. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2192) KRPC should have a timer to close stuck connections

2019-06-28 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875292#comment-16875292
 ] 

Michael Ho commented on KUDU-2192:
--

Added SO_KEEPALIVE in 
[https://github.com/apache/kudu/commit/89c02fded7595b4712b465bfb939e4f3035b2e75]

> KRPC should have a timer to close stuck connections
> ---
>
> Key: KUDU-2192
> URL: https://issues.apache.org/jira/browse/KUDU-2192
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
> Fix For: n/a
>
>
> If the remote host goes down or its network gets unplugged, all pending RPCs 
> to that host will be stuck if there is no timeout specified. While those RPCs 
> which have finished sending their payloads or those which haven't started 
> sending payloads can be cancelled quickly, those in mid-transmission (i.e. an 
> RPC at the front of the outbound queue with part of its payload sent already) 
> cannot be cancelled until the payload has been completely sent. Therefore, 
> it's beneficial to have a timeout to kill a connection if it's not making any 
> progress for an extended period of time so the RPC will fail and get unstuck. 
> The timeout may need to be conservatively large to avoid aggressive closing 
> of connections due to transient network issue. One can consider augmenting 
> the existing maintenance thread logic which checks for idle connection to 
> check for this kind of timeout. Please feel free to propose other 
> alternatives (e.g. TPC keepalive timeout) in this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1620) Consensus peer proxy hostnames should be reresolved on failure

2019-06-28 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1620:
--
Labels: docker  (was: )

> Consensus peer proxy hostnames should be reresolved on failure
> --
>
> Key: KUDU-1620
> URL: https://issues.apache.org/jira/browse/KUDU-1620
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.0.0
>Reporter: Adar Dembo
>Priority: Major
>  Labels: docker
>
> Noticed this while documenting the workflow to replace a dead master, which 
> currently bypasses Raft config changes in favor of having the replacement 
> master "masquerade" as the dead master via DNS changes.
> Internally we never rebuild consensus peer proxies in the event of network 
> failure; we assume that the peer will return at the same location. Nominally 
> this is reasonable; allowing peers to change host/port information on the fly 
> is tricky and has yet to be implemented. But, we should at least retry the 
> DNS resolution; not doing so forces the workflow to include steps to restart 
> the existing masters, which creates a (small) availability outage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1620) Consensus peer proxy hostnames should be reresolved on failure

2019-06-28 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875253#comment-16875253
 ] 

Grant Henke commented on KUDU-1620:
---

There was a discussion in the slack channel discussing how this impacts 
Kubernetes/Docker deployments. 

If a container/instance/host running a master is down/crashed tservers are 
unable to restart and fail with something like:
{noformat}
Tservers start crashing with the logs saying 'Couldn't resolve master service 
address 'master-1': unable to resolve address for master-1: Temporary failure 
in name resolution'. 
{noformat}

[~tlipcon] mentioned an option for a workaround in the thread:
{quote}
ah, maybe we will stay up if the master resolution fails at runtime, but will 
refuse to start if one of the masters is unresolvable.
which is probably on purpose to avoid people starting with typos in their 
master list and not noticing
for the k8s use case I can see why you might want it, though -- perhaps we need 
some flag to explicitly allow starting when some number o the masters are 
unresolvable, and be sure it keeps retrying to resolve
{quote}

This also appears to be why we added a 2 second delay to the docker entrypoint 
script: 
https://github.com/apache/kudu/blob/ad798391fdf22c1632a641dbb6be80085636602a/docker/kudu-entrypoint.sh#L80



 

> Consensus peer proxy hostnames should be reresolved on failure
> --
>
> Key: KUDU-1620
> URL: https://issues.apache.org/jira/browse/KUDU-1620
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.0.0
>Reporter: Adar Dembo
>Priority: Major
>  Labels: docker
>
> Noticed this while documenting the workflow to replace a dead master, which 
> currently bypasses Raft config changes in favor of having the replacement 
> master "masquerade" as the dead master via DNS changes.
> Internally we never rebuild consensus peer proxies in the event of network 
> failure; we assume that the peer will return at the same location. Nominally 
> this is reasonable; allowing peers to change host/port information on the fly 
> is tricky and has yet to be implemented. But, we should at least retry the 
> DNS resolution; not doing so forces the workflow to include steps to restart 
> the existing masters, which creates a (small) availability outage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2212) TSAN "destroy of locked mutex" failure in kudu-admin-test

2019-06-28 Thread Alexey Serbin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin reassigned KUDU-2212:
---

Assignee: Alexey Serbin

> TSAN "destroy of locked mutex" failure in kudu-admin-test
> -
>
> Key: KUDU-2212
> URL: https://issues.apache.org/jira/browse/KUDU-2212
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: Todd Lipcon
>Assignee: Alexey Serbin
>Priority: Major
> Attachments: kudu-tool-test.0.txt.gz
>
>
> admin cli test is flaky with:
> WARNING: ThreadSanitizer: destroy of a locked mutex (pid=16401)
> #0 pthread_rwlock_destroy 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1214
>  (kudu+0x49c866)
> #1 glog_internal_namespace_::Mutex::~Mutex() 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/base/mutex.h:249:30
>  (libglog.so.0+0x15878)
> #2 at_exit_wrapper(void*) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:375
>  (kudu+0x4706a3)
>   and:
> #0 memcpy 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:655
>  (kudu+0x49344c)
> #1 std::__1::char_traits::copy(char*, char const*, unsigned long) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/__string:223:50
>  (kudu+0x51285e)
> #2 std::__1::basic_string, 
> std::__1::allocator >::__init(char const*, unsigned long) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:1538:5
>  (libkudu_util.so+0x13c118)
> #3 std::__1::basic_string, 
> std::__1::allocator >::basic_string(char const*) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:1547
>  (libkudu_util.so+0x13c118)
> #4 kudu::flag_tags_internal::FlagTagger::FlagTagger(char const*, char 
> const*) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/flag_tags.cc:76
>  (libkudu_util.so+0x13c118)
> #5 __cxx_global_var_init.8 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/server/rpc_server.cc:54:1
>  (libserver_process.so+0x5f7c8)
> #6 _GLOBAL__sub_I_rpc_server.cc 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/server/rpc_server.cc
>  (libserver_process.so+0x5fcfc)
> #7 _dl_init_internal  (ld-linux-x86-64.so.2+0xe64e)
>   Location is global 'google::log_mutex' of size 64 at 0x7ff79ecd3038 
> (libglog.so.0+0x0022b038)
>   Mutex M115 (0x7ff79ecd3038) created at:
> #0 pthread_rwlock_init 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1205
>  (kudu+0x49ca9c)
> #1 glog_internal_namespace_::Mutex::Mutex() 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/base/mutex.h:247:19
>  (libglog.so.0+0xc5f3)
> #2 __cxx_global_var_init.130 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc:372
>  (libglog.so.0+0xc5f3)
> #3 _GLOBAL__sub_I_logging.cc 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc
>  (libglog.so.0+0xc5f3)
> #4 _dl_init_internal  (ld-linux-x86-64.so.2+0xe64e)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2212) TSAN "destroy of locked mutex" failure in kudu-admin-test

2019-06-28 Thread Alexey Serbin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2212:

Status: In Review  (was: Open)

> TSAN "destroy of locked mutex" failure in kudu-admin-test
> -
>
> Key: KUDU-2212
> URL: https://issues.apache.org/jira/browse/KUDU-2212
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: Todd Lipcon
>Assignee: Alexey Serbin
>Priority: Major
> Attachments: kudu-tool-test.0.txt.gz
>
>
> admin cli test is flaky with:
> WARNING: ThreadSanitizer: destroy of a locked mutex (pid=16401)
> #0 pthread_rwlock_destroy 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1214
>  (kudu+0x49c866)
> #1 glog_internal_namespace_::Mutex::~Mutex() 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/base/mutex.h:249:30
>  (libglog.so.0+0x15878)
> #2 at_exit_wrapper(void*) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:375
>  (kudu+0x4706a3)
>   and:
> #0 memcpy 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:655
>  (kudu+0x49344c)
> #1 std::__1::char_traits::copy(char*, char const*, unsigned long) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/__string:223:50
>  (kudu+0x51285e)
> #2 std::__1::basic_string, 
> std::__1::allocator >::__init(char const*, unsigned long) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:1538:5
>  (libkudu_util.so+0x13c118)
> #3 std::__1::basic_string, 
> std::__1::allocator >::basic_string(char const*) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:1547
>  (libkudu_util.so+0x13c118)
> #4 kudu::flag_tags_internal::FlagTagger::FlagTagger(char const*, char 
> const*) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/flag_tags.cc:76
>  (libkudu_util.so+0x13c118)
> #5 __cxx_global_var_init.8 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/server/rpc_server.cc:54:1
>  (libserver_process.so+0x5f7c8)
> #6 _GLOBAL__sub_I_rpc_server.cc 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/server/rpc_server.cc
>  (libserver_process.so+0x5fcfc)
> #7 _dl_init_internal  (ld-linux-x86-64.so.2+0xe64e)
>   Location is global 'google::log_mutex' of size 64 at 0x7ff79ecd3038 
> (libglog.so.0+0x0022b038)
>   Mutex M115 (0x7ff79ecd3038) created at:
> #0 pthread_rwlock_init 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1205
>  (kudu+0x49ca9c)
> #1 glog_internal_namespace_::Mutex::Mutex() 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/base/mutex.h:247:19
>  (libglog.so.0+0xc5f3)
> #2 __cxx_global_var_init.130 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc:372
>  (libglog.so.0+0xc5f3)
> #3 _GLOBAL__sub_I_logging.cc 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc
>  (libglog.so.0+0xc5f3)
> #4 _dl_init_internal  (ld-linux-x86-64.so.2+0xe64e)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2212) TSAN "destroy of locked mutex" failure in kudu-admin-test

2019-06-28 Thread Alexey Serbin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2212:

Code Review: https://gerrit.cloudera.org/#/c/13763/

> TSAN "destroy of locked mutex" failure in kudu-admin-test
> -
>
> Key: KUDU-2212
> URL: https://issues.apache.org/jira/browse/KUDU-2212
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: Todd Lipcon
>Assignee: Alexey Serbin
>Priority: Major
> Attachments: kudu-tool-test.0.txt.gz
>
>
> admin cli test is flaky with:
> WARNING: ThreadSanitizer: destroy of a locked mutex (pid=16401)
> #0 pthread_rwlock_destroy 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1214
>  (kudu+0x49c866)
> #1 glog_internal_namespace_::Mutex::~Mutex() 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/base/mutex.h:249:30
>  (libglog.so.0+0x15878)
> #2 at_exit_wrapper(void*) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:375
>  (kudu+0x4706a3)
>   and:
> #0 memcpy 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:655
>  (kudu+0x49344c)
> #1 std::__1::char_traits::copy(char*, char const*, unsigned long) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/__string:223:50
>  (kudu+0x51285e)
> #2 std::__1::basic_string, 
> std::__1::allocator >::__init(char const*, unsigned long) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:1538:5
>  (libkudu_util.so+0x13c118)
> #3 std::__1::basic_string, 
> std::__1::allocator >::basic_string(char const*) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:1547
>  (libkudu_util.so+0x13c118)
> #4 kudu::flag_tags_internal::FlagTagger::FlagTagger(char const*, char 
> const*) 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/flag_tags.cc:76
>  (libkudu_util.so+0x13c118)
> #5 __cxx_global_var_init.8 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/server/rpc_server.cc:54:1
>  (libserver_process.so+0x5f7c8)
> #6 _GLOBAL__sub_I_rpc_server.cc 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/server/rpc_server.cc
>  (libserver_process.so+0x5fcfc)
> #7 _dl_init_internal  (ld-linux-x86-64.so.2+0xe64e)
>   Location is global 'google::log_mutex' of size 64 at 0x7ff79ecd3038 
> (libglog.so.0+0x0022b038)
>   Mutex M115 (0x7ff79ecd3038) created at:
> #0 pthread_rwlock_init 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1205
>  (kudu+0x49ca9c)
> #1 glog_internal_namespace_::Mutex::Mutex() 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/base/mutex.h:247:19
>  (libglog.so.0+0xc5f3)
> #2 __cxx_global_var_init.130 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc:372
>  (libglog.so.0+0xc5f3)
> #3 _GLOBAL__sub_I_logging.cc 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc
>  (libglog.so.0+0xc5f3)
> #4 _dl_init_internal  (ld-linux-x86-64.so.2+0xe64e)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)