[jira] [Created] (KUDU-2865) Relax the requirements to get an authorization token

2019-06-15 Thread Andrew Wong (JIRA)
Andrew Wong created KUDU-2865:
-

 Summary: Relax the requirements to get an authorization token
 Key: KUDU-2865
 URL: https://issues.apache.org/jira/browse/KUDU-2865
 Project: Kudu
  Issue Type: Improvement
  Components: authz
Affects Versions: 1.10.0
Reporter: Andrew Wong


Currently in order to do any DML with Kudu, a user must have any (i.e. 
"METADATA") privilege on a table so the user can get an authorization token. 
This is because authz token generation is piggy-backed on the GetTableSchema 
endpoint, which does all-or-nothing authorization for the table.

This isn't a great user experience, e.g. if a user only has column-level 
privileges. Unless such a user _also_ had a table-level privilege (e.g. insert 
privileges on the table), the user would be unable to scan the columns through 
direct Kudu APIs. We should consider perhaps modifying the GetTableSchema 
endpoint to return only the sub-schema and the privileges for which the user 
has column-level privileges or higher.

This user experience would be closer to what is supported by Apache Impala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2810) Restore needs DELETE_IGNORE

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2810.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

Resolved via 11a6a06d646fc852f26e9ac6cfda8f3c5c09b9b0

Remaining improvements can be tracked in KUDU-1563

> Restore needs DELETE_IGNORE
> ---
>
> Key: KUDU-2810
> URL: https://issues.apache.org/jira/browse/KUDU-2810
> Project: Kudu
>  Issue Type: Bug
>  Components: backup
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Assignee: Grant Henke
>Priority: Major
> Fix For: 1.10.0
>
>
> If a restore task fails for any reason, and it's restoring an incremental 
> with DELETE row actions, when the task is retried it will fail any deletes 
> that happened on the previous task run. We need a DELETE_IGNORE write 
> operation to handle this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2780) Rebalance Kudu cluster in background

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2780:
--
Labels: kudu-roadmap  (was: )

> Rebalance Kudu cluster in background
> 
>
> Key: KUDU-2780
> URL: https://issues.apache.org/jira/browse/KUDU-2780
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Alexey Serbin
>Assignee: HeLifu
>Priority: Major
>  Labels: kudu-roadmap
>
> With the introduction of `kudu cluster rebalance` CLI tool it's possible to 
> balance the distribution of tablet replicas in a Kudu cluster.  However, that 
> tool should be run manually or via an external scheduler (e.g. cron).
> It would be nice if Kudu would track and correct imbalances of replica 
> distribution automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2739) Better support for running servers ephemerally

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2739:
--
Issue Type: Improvement  (was: Bug)

> Better support for running servers ephemerally
> --
>
> Key: KUDU-2739
> URL: https://issues.apache.org/jira/browse/KUDU-2739
> Project: Kudu
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 1.9.0
>Reporter: Adar Dembo
>Priority: Major
>
> Often times during development it can be useful to run a Kudu server 
> "ephemerally", which is just another way of saying "the server should create 
> as few on-disk files as possible". We can't avoid creating WALs and data 
> files, but it should be possible to avoid:
>  # glog files
>  # diagnostic log files
>  # minidumps
> Prior to the introduction of #2 and #3, running Kudu with {{\-\-logtostderr}} 
> was all one needed in order to minimize the footprint of the server. However, 
> #2 and #3 both use the value of {{--log_dir}} (defaults to /tmp) to decide 
> where to place their output.
> To avoid introducing a new pattern, it'd be nice to once again consider 
> {{\-\-logtostderr}} as an indication that the user is trying to run the 
> server ephemerally and to minimize output (by disabling diagnostic logs and 
> minidumps). However, we do need to account for cases where log files are 
> desirable, but the user ran Kudu with {{\-\-logtostderr}} and shell 
> redirection in order to manage the location of the logging.
> One possible approach: disable diagnostic logs and minidumps if 
> {{\-\-logtostderr}} is set and if {{--log_dir}} is _not_ set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2737) Allow KuduContext row errors to be handled

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2737:
--
Labels: usability  (was: )

> Allow KuduContext row errors to be handled
> --
>
> Key: KUDU-2737
> URL: https://issues.apache.org/jira/browse/KUDU-2737
> Project: Kudu
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 1.9.0
>Reporter: Grant Henke
>Priority: Major
>  Labels: usability
>
> Currently when writing to Kudu via Spark the writeRows Api detects all row 
> errors and throws a RuntimeException with some of the sample errors included 
> in the string: 
> https://github.com/apache/kudu/blob/master/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala#L344
> We should optionally return these row errors and  allow users to handle them, 
> or potentially take and error handler function to allow custom error handling 
> logic to be passed through. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2722) Ability to mark a partition or table as read only

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2722:
--
Labels: kudu-roadmap  (was: )

> Ability to mark a partition or table as read only
> -
>
> Key: KUDU-2722
> URL: https://issues.apache.org/jira/browse/KUDU-2722
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Grant Henke
>Priority: Major
>  Labels: kudu-roadmap
>
> It could be useful to prevent data from being mutated in a table or 
> partition. For example this would allow users to lock older range partitions 
> from receiving inserts/updates/deletes ensuring any queries/reports running 
> on that data always show the same results.
> There might also be optimization (resource/storage) opportunities we could 
> make server side once a table is marked as read only. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2649) can not create table in impala

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2649.
---
   Resolution: Invalid
Fix Version/s: n/a

> can not create table in impala
> --
>
> Key: KUDU-2649
> URL: https://issues.apache.org/jira/browse/KUDU-2649
> Project: Kudu
>  Issue Type: Bug
>Reporter: qinzl_1
>Priority: Major
> Fix For: n/a
>
> Attachments: 1.jpg, 2.jpg
>
>
> ERROR: ImpalaRuntimeException: Error creating Kudu table 
> 'impala::kudu_yxy_ods.renew_realtime_log'
> CAUSED BY: NonRecoverableException: too many attempts: 
> KuduRpc(method=IsCreateTableDone, tablet=null, attempt=105, 
> DeadlineTracker(timeout=18, elapsed=179305), Traces: [0ms] sending RPC to 
> server master-serverA1:7051, [7ms] received from server master-serverA1:7051 
> response OK, [20ms] sending RPC to server master-serverA1:7051, [20ms] 
> received from server master-serverA1:7051 response OK, [40ms] sending RPC to 
> server master-serverA1:7051, [40ms] received from server master-serverA1:7051 
> response OK, [59ms] sending RPC to server master-serverA1:7051, [60ms] 
> received from server master-serverA1:7051 response OK, [80ms] sending RPC to 
> server master-serverA1:7051, [80ms] received from server master-serverA1:7051 
> response OK, [100ms] sending RPC to server master-serverA1:7051, [100ms] 
> received from server master-serverA1:7051 response OK, [139ms] sending RPC to 
> server master-serverA1:7051, [140ms] received from server 
> master-serverA1:7051 response OK, [260ms] sending RPC to server 
> master-serverA1:7051, [260ms] received from server master-serverA1:7051 
> response OK, [439ms] sending RPC to server master-serverA1:7051, [440ms] 
> received from server master-serverA1:7051 response OK, [960ms] sending RPC to 
> server master-serverA1:7051, [960ms] received from server 
> master-serverA1:7051 response OK, [1800ms] sending RPC to server 
> master-serverA1:7051, [1801ms] received from server master-serverA1:7051 
> response OK, [3140ms] sending RPC to server master-serverA1:7051, [3140ms] 
> received from server master-serverA1:7051 response OK, [4159ms] sending RPC 
> to server master-serverA1:7051, [4160ms] received from server 
> master-serverA1:7051 response OK, [7400ms] sending RPC to server 
> master-serverA1:7051, [7400ms] received from server master-serverA1:7051 
> response OK, [7820ms] sending RPC to server master-serverA1:7051, [7821ms] 
> received from server master-serverA1:7051 response OK, [9820ms] sending RPC 
> to server master-serverA1:7051, [9820ms] received from server 
> master-serverA1:7051 response OK, [10640ms] sending RPC to server 
> master-serverA1:7051, [10641ms] received from server master-serverA1:7051 
> response OK, [10839ms] sending RPC to server master-serverA1:7051, [10840ms] 
> received from server master-serverA1:7051 response OK, [12439ms] sending RPC 
> to server master-serverA1:7051, [12440ms] received from server 
> master-serverA1:7051 response OK, [12980ms] sending RPC to server 
> master-serverA1:7051, [12981ms] received from server master-serverA1:7051 
> response OK, [15979ms] sending RPC to server master-serverA1:7051, [15980ms] 
> received from server master-serverA1:7051 response OK, [19160ms] sending RPC 
> to server master-serverA1:7051, [19161ms] received from server 
> master-serverA1:7051 response OK, [22540ms] sending RPC to server 
> master-serverA1:7051, [22540ms] received from server master-serverA1:7051 
> response OK, [24680ms] sending RPC to server master-serverA1:7051, [24680ms] 
> received from server master-serverA1:7051 response OK, [26759ms] sending RPC 
> to server master-serverA1:7051, [26760ms] received from server 
> master-serverA1:7051 response OK, [30779ms] sending RPC to server 
> master-serverA1:7051, [30780ms] received from server master-serverA1:7051 
> response OK, [31879ms] sending RPC to server master-serverA1:7051, [31880ms] 
> received from server master-serverA1:7051 response OK, [33039ms] sending RPC 
> to server master-serverA1:7051, [33040ms] received from server 
> master-serverA1:7051 response OK, [34739ms] sending RPC to server 
> master-serverA1:7051, [34740ms] received from server master-serverA1:7051 
> response OK, [35160ms] sending RPC to server master-serverA1:7051, [35160ms] 
> received from server master-serverA1:7051 response OK, [35740ms] sending RPC 
> to server master-serverA1:7051, [35740ms] received from server 
> master-serverA1:7051 response OK, [37680ms] sending RPC to server 
> master-serverA1:7051, [37680ms] received from server master-serverA1:7051 
> response OK, [41499ms] sending RPC to server master-serverA1:7051, [41500ms] 
> received from server master-serverA1:7051 response OK, [44720ms] sending RPC 
> to server master-serverA1:7051, [44720ms] received from server 
> 

[jira] [Updated] (KUDU-2632) Add DATE data type

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2632:
--
Labels: kudu-roadmap  (was: )

> Add DATE data type
> --
>
> Key: KUDU-2632
> URL: https://issues.apache.org/jira/browse/KUDU-2632
> Project: Kudu
>  Issue Type: New Feature
>  Components: tablet
>Affects Versions: 1.8.0
>Reporter: Adar Dembo
>Priority: Major
>  Labels: kudu-roadmap
>
> {{DATE}} is a timezone-agnostic measure of the current date (without time of 
> day) in -MM-DD form. The range of values typically supported is 
> -­01-­01 to -­12-­31. [Parquet has 
> implemented|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#date]
>  {{DATE}} as a logical type mapped to int32, represented as the number of 
> days since the start of the UNIX epoch (i.e. January 1, 1970).
> Support for {{DATE}} is currently being added to Impala; we should also add 
> support for it to Kudu.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2483) Scan tablets with bloom filter

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2483:
--
Labels: kudu-roadmap  (was: )

> Scan tablets with bloom filter
> --
>
> Key: KUDU-2483
> URL: https://issues.apache.org/jira/browse/KUDU-2483
> Project: Kudu
>  Issue Type: New Feature
>  Components: client
>Reporter: jin xing
>Priority: Major
>  Labels: kudu-roadmap
> Attachments: BloomFilter+Design+Doc.pdf, KUDU-2483, 
> image-2018-07-01-23-29-05-517.png
>
>
> Join is really common/popular in Spark SQL, in this JIRA I take broadcast 
> join as an example and describe how Kudu's bloom filter can help accelerate 
> distributed computing.
> Spark runs broadcast join with below steps:
>  1. When do broadcast join, we have a small table and a big table; Spark will 
> read all data from small table to one worker and build a hash table;
>  2. The generated hash table from step 1 is broadcasted to all the workers, 
> which will read the splits from big table;
>  3. Workers start fetching and iterating all the splits of big table and see 
> if the joining keys exists in the hash table; Only matched joining keys is 
> retained.
> From above, step 3 is the heaviest, especially when the worker and split 
> storage is not on the same host and bandwith is limited. Actually the cost 
> brought by step 3 is not always necessary. Think about below scenario:
> {code:none}
> Small table A
> id      name
> 1      Jin
> 6      Xing
> Big table B
> id     age
> 1      10
> 2      21
> 3      33
> 4      65
> 5      32
> 6      23
> 7      18
> 8      20
> 9      22
> {code}
> Run query with SQL: *select * from A inner join B on A.id=B.id*
> It's pretty straight that we don't need to fetch all the data from Table B, 
> because the number of matched keys is really small;
> I propose to use small table to build a bloom filter(BF) and use the 
> generated BF as a predicate/filter to fetch data from big table, thus:
>  1. Much traffic/bandwith is saved.
>  2. Less data to processe by worker
> Broadcast join is just an example, other types of join will also benefit if 
> we scan with a BF
> In a nutshell, I think Kudu can provide an iterface, by which user can scan 
> data with bloom filters
>  
> Here I want add some statistics for Spark-Kudu integration with/without 
> BloomBloomFilter.
> In our product environment the bandwidth of each executor is 50M bps.
> We do inner join with two tables – – one is large and another one is 
> comparatively small.
> In Spark, inner join can be implemented as SortMergeJoin or 
> BroadcastHashJoin, we implemented the corresponding operators with 
> BloomFilter as SortMergeBloomFilterJoin and BroadcastBloomFilterJoin.
> The hash table of BloomFilter is configured as 32M. 
> I show statistics as below:
> ||Records of Table A||Records of Table B||Join Operator||Executor Time||
> |400 thousand|14 billion|SortMergeJoin|760 seconds|
> |400 thousand|14 billion|BroadcastHashJoin|376s|
> |400 thousand|14 billion|BroadcastBloomFilterJoin|21s|
> |2 million|14 billion|SortMergeJoin|707s|
> |2 million|14 billion|BroadcastHashJoin|329s|
> |2 million|14 billion|SortMergeBloomFilterJoin|75s|
> |2 million|14 billion|BroadcastBloomFilterJoin|35s|
> As we can see, it benefit a lot from BloomFilter-PushDown. 
> I want to take this jira  as a umbrella and my workmates will submit 
> following sub-task/pr.
> It will be great if some can take more look at this and share some comments. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2614) Implement asynchronous replication

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2614:
--
Labels: kudu-roadmap  (was: )

> Implement asynchronous replication
> --
>
> Key: KUDU-2614
> URL: https://issues.apache.org/jira/browse/KUDU-2614
> Project: Kudu
>  Issue Type: Task
>Reporter: Mike Percy
>Priority: Major
>  Labels: kudu-roadmap
>
> Implement asynchronous cluster-to-cluster replication (across WAN links) for 
> Kudu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2613) Implement secondary indexes

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2613:
--
Labels: kudu-roadmap  (was: )

> Implement secondary indexes
> ---
>
> Key: KUDU-2613
> URL: https://issues.apache.org/jira/browse/KUDU-2613
> Project: Kudu
>  Issue Type: Task
>Reporter: Mike Percy
>Priority: Major
>  Labels: kudu-roadmap
>
> Tracking Jira to implement secondary indexes in Kudu



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2612) Implement multi-row transactions

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2612:
--
Labels: kudu-roadmap  (was: )

> Implement multi-row transactions
> 
>
> Key: KUDU-2612
> URL: https://issues.apache.org/jira/browse/KUDU-2612
> Project: Kudu
>  Issue Type: Task
>Reporter: Mike Percy
>Priority: Major
>  Labels: kudu-roadmap
>
> Tracking Jira to implement multi-row / multi-table transactions in Kudu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2533) Add bloomfilter predicate in server side.

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2533.
---
   Resolution: Fixed
Fix Version/s: 1.9.0

Resolved via 
[https://github.com/apache/kudu/commit/8af288a26a204e2acfc3aa4e642fba7de56b43bb]

> Add  bloomfilter predicate in server side.
> --
>
> Key: KUDU-2533
> URL: https://issues.apache.org/jira/browse/KUDU-2533
> Project: Kudu
>  Issue Type: Sub-task
>  Components: tablet
>Affects Versions: 1.8.0
>Reporter: ZhangYao
>Assignee: ZhangYao
>Priority: Major
> Fix For: 1.9.0
>
>
> Add new bloomfilter predicate for scan. Currently base on cityhash which 
> already been implemented in server side. When scan comes with bloomfilter, 
> its should set the bloomfilter data and the hash times, then we can use them 
> to form a bloomfilter in server side to filter columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2518) SparkSQL queries without temporary tables

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2518:
--
Labels: kudu-roadmap  (was: )

> SparkSQL queries without temporary tables
> -
>
> Key: KUDU-2518
> URL: https://issues.apache.org/jira/browse/KUDU-2518
> Project: Kudu
>  Issue Type: Improvement
>  Components: hms, spark
>Affects Versions: 1.7.1
>Reporter: Dan Burkert
>Priority: Major
>  Labels: kudu-roadmap
>
> One long-standing ergonomic issue with the Kudu/SparkSQL integration is the 
> requirement to register Kudu tables as temp tables before they can be scanned 
> using a SQL string ({{sql("SELECT * FROM my_kudu_table")}}).  Ideally 
> SparkSQL could query Kudu tables that it discovers via the HMS with no 
> additional configuration.  Yesterday I explored what it would take to get 
> there, and I found some interesting things.
>  
> If the HMS table contains a {{spark.sql.sources.provider}} table property 
> with a value like {{org.apache.kudu.spark.kudu.DefaultSource}}, SparkSQL will 
> automatically instantiate the corresponding {{RelationProvider}} class, 
> passing a {{SQLContext}} and a map of parameters, which it fills in with the 
> table's HDFS URI, and storage properties.  The current plan for Kudu + HMS 
> integration (KUDU-2191) is not to set any storage properties, instead 
> attributes like master addresses and table ID will be stored as table 
> properties.  As a result, SparkSQL is instantiating a Kudu {{DefaultSource}}, 
> but it doesn't pass necessary arguments like the table name or master 
> addresses.   Getting this far required adding a dummy 
> {{org.apache.kudu.hive.KuduStorageHandler}} class to the classpath so that 
> the Hive client wouldn't choke on the bogus class name.  The stacktrace from 
> Spark attempting to instantiate the {{DefaultSource}} is provided below.
>  
> {code:java}
> Spark context Web UI available at http://kudu-hms-1.gce.cloudera.com:4041
> Spark context available as 'sc' (master = local[*], app id = 
> local-1532719985143).
> Spark session available as 'spark'.
> Welcome to
>     __
>  / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 2.3.1
>   /_/
>      
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_181)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sql("DESCRIBE TABLE t1")
> org.spark_project.guava.util.concurrent.UncheckedExecutionException: 
> java.lang.IllegalArgumentException: Kudu table name must be specified in 
> create options using key 'kudu.table'.  parameters: Map(), parameters-size: 
> 0, parameters-keys: Set(), path: None
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:137)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:227)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:264)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:255)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:255)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:223)
>   at 
> 

[jira] [Updated] (KUDU-2490) implement Kudu DataSourceV2 and related classes

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2490:
--
Labels: kudu-roadmap  (was: )

> implement Kudu DataSourceV2 and related classes
> ---
>
> Key: KUDU-2490
> URL: https://issues.apache.org/jira/browse/KUDU-2490
> Project: Kudu
>  Issue Type: Improvement
>  Components: spark
>Reporter: Andrew Wong
>Priority: Major
>  Labels: kudu-roadmap
>
> The current Kudu-Spark bindings implement a DefaultSource that extends a 
> RelationProvider, which provides BaseRelations to Spark, which, as I 
> understand it, are physical units of query execution and represent sets of 
> rows. The Kudu BaseRelation (the KuduRelation) implements a couple of traits 
> to fit into Spark: PrunedFilteredScan, which allows predicates to be pushed 
> into Kudu, and InsertableRelation, which allows writes to be pushed into 
> Kudu. An issue with these bindings is that, while they provide interfaces to 
> insert/get data, they do not provide interfaces to push details to Spark that 
> might be useful to optimizing a Kudu query.
> Among other things, this is inconvenient for all datasources that might want 
> to take such optimizations into their own hands, and the Spark community 
> appears to be making efforts in revamping their DataSource APIs in the form 
> of DataSourceV2, and as it pertains to read support, the v2 DataSourceReader. 
> This new world order provides a clear path towards implementing various 
> optimizations that are currently unavailable with the current Spark bindings, 
> without pushing changes to Spark itself.
> Of note, the v2 DataSourceReader can be extended with 
> SupportsReportStatistics, which could allow Kudu to expose statistics to Kudu 
> without having to rely on HMS (although pushing stats to HMS isn't an 
> unreasonable approach either). More traits and details about the API can be 
> found 
> [here|https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/sources/v2/reader/DataSourceReader.html].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2019) Expose table/column statistics

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2019:
--
Labels: impala kudu-roadmap  (was: impala)

> Expose table/column statistics
> --
>
> Key: KUDU-2019
> URL: https://issues.apache.org/jira/browse/KUDU-2019
> Project: Kudu
>  Issue Type: New Feature
>  Components: client
>Affects Versions: 1.0.0
>Reporter: Matthew Jacobs
>Priority: Major
>  Labels: impala, kudu-roadmap
>
> It would be helpful for query engines such as Impala to get table/column 
> statistics from Kudu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2347) Java tests should use unique dirs for minicluster data

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864832#comment-16864832
 ] 

Grant Henke commented on KUDU-2347:
---

I think this has been resolved in the kudu-test-utils changes. Please re-open 
if not. 

> Java tests should use unique dirs for minicluster data
> --
>
> Key: KUDU-2347
> URL: https://issues.apache.org/jira/browse/KUDU-2347
> Project: Kudu
>  Issue Type: Improvement
>  Components: java, test
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: newbie
>
> Currently the Java minicluster just uses TEST_TMPDIR/minicluster-data as a 
> root for its miniclusters. I found some case in which one test failed to 
> clean up that directory and then the next test failed to start because it 
> found existing on-disk data with a different set of master hosts/IPs. It also 
> prevents running multiple instances of the tests in parallel.
> We should have the Java tests pass in a unique directory name into the 
> minicluster tool based on the current test being executed to above these 
> collisions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2347) Java tests should use unique dirs for minicluster data

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2347.
---
   Resolution: Fixed
Fix Version/s: n/a

> Java tests should use unique dirs for minicluster data
> --
>
> Key: KUDU-2347
> URL: https://issues.apache.org/jira/browse/KUDU-2347
> Project: Kudu
>  Issue Type: Improvement
>  Components: java, test
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: newbie
> Fix For: n/a
>
>
> Currently the Java minicluster just uses TEST_TMPDIR/minicluster-data as a 
> root for its miniclusters. I found some case in which one test failed to 
> clean up that directory and then the next test failed to start because it 
> found existing on-disk data with a different set of master hosts/IPs. It also 
> prevents running multiple instances of the tests in parallel.
> We should have the Java tests pass in a unique directory name into the 
> minicluster tool based on the current test being executed to above these 
> collisions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2252) Design and implement native encryption at rest

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2252:
--
Labels: kudu-roadmap  (was: )

> Design and implement native encryption at rest
> --
>
> Key: KUDU-2252
> URL: https://issues.apache.org/jira/browse/KUDU-2252
> Project: Kudu
>  Issue Type: New Feature
>  Components: fs, security
>Reporter: Mike Percy
>Priority: Major
>  Labels: kudu-roadmap
>
> It would be beneficial for Kudu to support native encryption at rest.
> While the underlying filesystem can be encrypted with Kudu on top, there are 
> benefits to native encryption at rest:
> * With native encryption support, it may be possible to specify different 
> encryption keys for different objects or object trees (such as is possible 
> with HDFS encryption)
> * It may be possible to share a single block device with other storage, 
> including HDFS (note that some linux setups allow for "splitting" a device)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1994) Automatically Create New Range Partitions When Needed

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1994:
--
Labels: kudu-roadmap  (was: )

> Automatically Create New Range Partitions When Needed
> -
>
> Key: KUDU-1994
> URL: https://issues.apache.org/jira/browse/KUDU-1994
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Alan Jackoway
>Priority: Major
>  Labels: kudu-roadmap
>
> We have a few Kudu tables where we use a range-partitioned timestamp as part 
> of the key. The intention of this is to keep data locality for data that is 
> likely to be scanned together, such as events in a timeseries.
> Currently we create these with a partitions that look like this:
> {noformat}
> RANGE (ts) (
> PARTITION 0 <= VALUES < 142008840,
> PARTITION 142008840 <= VALUES < 142786080,
> PARTITION 142786080 <= VALUES < 143572320,
> PARTITION 143572320 <= VALUES < 144367200,
> PARTITION 144367200 <= VALUES < 145162440,
> PARTITION 145162440 <= VALUES < 145948320,
> PARTITION 145948320 <= VALUES < 146734560,
> PARTITION 146734560 <= VALUES < 147529440,
> PARTITION 147529440 <= VALUES < 148324680,
> PARTITION 148324680 <= VALUES < 149103360,
> PARTITION 149103360 <= VALUES < 149889600,
> PARTITION 149889600 <= VALUES < 150684480
> )
> {noformat}
> The problem is that as time goes on we have to choose to either create empty 
> partitions in advance of when we are writing data or risk forgetting to 
> create a partition and having writes of new data fail.
> Ideally, Kudu would have a way to indicate the size of the partitions (in 
> this example 3 months converted to milliseconds) and then automatically 
> create new partitions when new data comes in that needs the partition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2191) Hive Metastore Integration

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KUDU-2191:
-

Assignee: Hao Hao  (was: Dan Burkert)

> Hive Metastore Integration
> --
>
> Key: KUDU-2191
> URL: https://issues.apache.org/jira/browse/KUDU-2191
> Project: Kudu
>  Issue Type: New Feature
>  Components: server
>Affects Versions: 1.5.0
>Reporter: Dan Burkert
>Assignee: Hao Hao
>Priority: Major
>
> In order to facilitate discovery of Kudu tables, as well as a shared table 
> namespace, Kudu should register its tables in the Hive Metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2181) Multi-master config change support

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2181:
--
Labels: kudu-roadmap  (was: )

> Multi-master config change support
> --
>
> Key: KUDU-2181
> URL: https://issues.apache.org/jira/browse/KUDU-2181
> Project: Kudu
>  Issue Type: Improvement
>  Components: consensus, master
>Reporter: Mike Percy
>Priority: Major
>  Labels: kudu-roadmap
>
> It would be very useful to add support to the Kudu master for dynamic config 
> change. The current procedure for replacing a failed master is fairly arduous.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2146) Tool to determine the leader master

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2146.
---
   Resolution: Duplicate
Fix Version/s: n/a

> Tool to determine the leader master
> ---
>
> Key: KUDU-2146
> URL: https://issues.apache.org/jira/browse/KUDU-2146
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: Andrew Wong
>Priority: Major
> Fix For: n/a
>
>
> Going through some docs regarding multi-master migration, it seems like some 
> processes are warned against in order to prevent data loss.
> As an example, adding masters to an existing multi-master deployment may mess 
> up the deployment and lose ops if new masters are added using a stale 
> follower as its "reference" master (i.e. the existing master from which data 
> is copied to the new master). As such, the docs warn against doing this 
> migration entirely, when it _should_ be safe to add the new masters using the 
> most up-to-date master as the reference master.
> It would thus be helpful to be able to determine which master is leader, or 
> at least has the highest op-id (finding the leader may be harder to figure 
> out if the masters are down).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2362) First class database support

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2362:
--
Labels: kudu-roadmap  (was: )

> First class database support
> 
>
> Key: KUDU-2362
> URL: https://issues.apache.org/jira/browse/KUDU-2362
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Grant Henke
>Priority: Major
>  Labels: kudu-roadmap
>
> This is a Jira is to track the tasks of adding first class "real" database 
> support to Kudu. 
> Currently Impala maps databases to Kudu by pre-pending the database name to 
> the table name. With HMS integration (KUDU-2191) we see a similar pattern. 
> Looking forward, authorization and backup features could also leverage the 
> concept a database. 
> It may be worth implementing this now to prevent a tricky compatibility 
> issues with all of these features. (We could map all tables to a default 
> database, but already we would need to handle the impala naming format and 
> migration). 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2077) Return data in Apache Arrow format

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2077:
--
Labels: kudu-roadmap  (was: )

> Return data in Apache Arrow format
> --
>
> Key: KUDU-2077
> URL: https://issues.apache.org/jira/browse/KUDU-2077
> Project: Kudu
>  Issue Type: New Feature
>  Components: client, server
>Reporter: Andrew Wong
>Priority: Major
>  Labels: kudu-roadmap
>
> Dan and I spent the hackathon tinkering with the Apache Arrow format. Arrow 
> is an in-memory columnar format designed to be the common data format for a 
> large number of projects, see [here|https://arrow.apache.org]. One place we 
> thought adding this would be particularly fitting is when sending results 
> back to the client, since this currently returns row-wise data. By returning 
> Arrow, this could open the door to simpler and faster integration with other 
> projects.
> The server-side changes can be localized to the tablet service and wire 
> protocol. We considered using Arrow more exhaustively throughout the server 
> codebase, but found that because Arrow and Kudu's own in-memory format (i.e. 
> that in kudu::ColumnBlock) are so similar, a simpler approach is to copy the 
> buffers from ColumnBlock to the scan response and build arrow::Arrays 
> client-side. A POC of the server-side changes can be found here: 
> https://github.com/danburkert/kudu/tree/arrow
> At the time of writing this, the arrow::Array type has a varying number of 
> arrow::Buffers, depending on the data type (e.g. one for null bitmaps, one 
> for data, etc). The ColumnBlock "buffers" (i.e. data, null_bitmap) should be 
> compatible with these Buffers with a couple of modifications:
> * The null-bitmaps in arrow are the complement of those used by Kudu
> * The RowBlock that owns the ColumnBlocks has a selection vector needs to be 
> accounted for
> If the buffers are transferred over the wire (via sidecars or protobuf), they 
> should be able to be converted to Arrays via arrow::ArrayData or directly via 
> the Array constructors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2054) Rolling Restart and Upgrade

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2054:
--
Labels: kudu-roadmap  (was: )

> Rolling Restart and Upgrade
> ---
>
> Key: KUDU-2054
> URL: https://issues.apache.org/jira/browse/KUDU-2054
> Project: Kudu
>  Issue Type: Wish
>Reporter: Alan Jackoway
>Priority: Major
>  Labels: kudu-roadmap
>
> It would be helpful for our operations if Kudu supported rolling restart and 
> rolling upgrade by some process of restarting tablet servers + masters one by 
> one or in batches less than the replication size.
> This would allow us to continue using Kudu during an upgrade or restart the 
> way we can with HBase or HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1967) Umbrella JIRA for node density improvements

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1967:
--
Labels: data-scalability kudu-roadmap  (was: data-scalability)

> Umbrella JIRA for node density improvements
> ---
>
> Key: KUDU-1967
> URL: https://issues.apache.org/jira/browse/KUDU-1967
> Project: Kudu
>  Issue Type: Task
>  Components: fs, master, tablet, tserver
>Affects Versions: 1.3.0
>Reporter: Adar Dembo
>Assignee: Adar Dembo
>Priority: Major
>  Labels: data-scalability, kudu-roadmap
>
> For the Kudu 1.4 release, I'll be working to improve node density.
> Here's a brief primer on Kudu's scalability targets today:
> # We recommend no more than 4 TB of total data per node. This is specific to 
> Kudu data blocks, so this data is post-encoding and post-compression.
> # We recommend no more than 1000 partitions (post-replication) per node.
> # We recommend no more than 100 nodes per cluster.
> # We recommend no more than 60 partitions per table per tserver.
> For 1.4, here's what we'd like to achieve:
> # Up to 16 TB of total data per node. Maybe even 48 TB, if possible.
> # Up to 100 "hot" partitions per node. In this context, "hot" means 
> partitions that are actively servicing writes.
> # Thousands of "cold" partitions per node. Put another way, it should be 
> drastically cheaper to serve "cold" partitions than it is today.
> # Maintain the "100 nodes per cluster" limit.
> # Remove the "no more than 60 partitions per table per node" limit.
> I'll be linking various interesting JIRAs into this one, and I'll document, 
> for each one, which aspect of data scalability it affects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1938) Support for VARCHAR type

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1938:
--
Labels: kudu-roadmap  (was: )

> Support for VARCHAR type
> 
>
> Key: KUDU-1938
> URL: https://issues.apache.org/jira/browse/KUDU-1938
> Project: Kudu
>  Issue Type: New Feature
>  Components: client, tablet
>Reporter: Farzana Kader
>Priority: Major
>  Labels: kudu-roadmap
>
> VARCHAR is currently not supported by Kudu.  This is functionality that 
> currently exists in Impala. Some client applications convert STRING to 32K 
> bytes which causes performance issues so they need the VARCHAR support in 
> order to integrate well with Kudu. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1938) Support for VARCHAR and CHAR type

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1938:
--
Summary: Support for VARCHAR and CHAR type  (was: Support for VARCHAR type)

> Support for VARCHAR and CHAR type
> -
>
> Key: KUDU-1938
> URL: https://issues.apache.org/jira/browse/KUDU-1938
> Project: Kudu
>  Issue Type: New Feature
>  Components: client, tablet
>Reporter: Farzana Kader
>Priority: Major
>  Labels: kudu-roadmap
>
> VARCHAR is currently not supported by Kudu.  This is functionality that 
> currently exists in Impala. Some client applications convert STRING to 32K 
> bytes which causes performance issues so they need the VARCHAR support in 
> order to integrate well with Kudu. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1808) Simplify KuduSession mutation buffer/flush watermark pct API

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1808.
---
   Resolution: Won't Fix
Fix Version/s: n/a

The setMutationBufferLowWatermark was deprecated/removed.

> Simplify KuduSession mutation buffer/flush watermark pct API
> 
>
> Key: KUDU-1808
> URL: https://issues.apache.org/jira/browse/KUDU-1808
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 1.0.0
>Reporter: Matthew Jacobs
>Assignee: Alexey Serbin
>Priority: Major
> Fix For: n/a
>
>
> The API to configure the KuduSession memory is very confusing. It's not 
> obvious how to pick good values for {{SetMutationBufferSpace()}}, 
> {{SetMutationBufferMaxNum()}}, and {{SetMutationBufferFlushWatermark()}}.  My 
> understanding is that a user might:
> * pick some amount of memory _M_ to be used for the mutation buffer space, 
> though this isn't obvious (high values increase throughput, but too high and 
> the writes may overwhelm tservers)
> * set the flush watermark pct to (1/_NumBuffers_) where _NumBuffers_ = ( _M_ 
> / 7MB) , and where 7MB is Kudu's internal  buffer size
>   (formula from [~tlipcon])
> * Set SetMutationBufferMaxNum(0) since the max buffers wouldn't be necessary.
> This is what Impala does at the moment, but this is very complicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1827) Add capability and tool to gracefully decommission a tablet server

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1827:
--
Labels: kudu-roadmap  (was: )

> Add capability and tool to gracefully decommission a tablet server
> --
>
> Key: KUDU-1827
> URL: https://issues.apache.org/jira/browse/KUDU-1827
> Project: Kudu
>  Issue Type: New Feature
>  Components: consensus, ops-tooling, tablet
>Reporter: Mike Percy
>Priority: Major
>  Labels: kudu-roadmap
>
> It would be useful to be able to decommission a tablet server gracefully, by 
> taking it out of the candidate for getting new tablets and re-replicating its 
> data to other servers before taking it out of tablet Raft configurations and 
> shutting it down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1829) Support cross-datacenter replication

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1829:
--
Labels: kudu-roadmap  (was: )

> Support cross-datacenter replication
> 
>
> Key: KUDU-1829
> URL: https://issues.apache.org/jira/browse/KUDU-1829
> Project: Kudu
>  Issue Type: New Feature
>  Components: consensus
>Reporter: Mike Percy
>Priority: Major
>  Labels: kudu-roadmap
>
> We would like to ultimately support cross-datacenter replication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1945:
--
Labels: kudu-roadmap  (was: )

> Support generation of surrogate primary keys (or tables with no PK)
> ---
>
> Key: KUDU-1945
> URL: https://issues.apache.org/jira/browse/KUDU-1945
> Project: Kudu
>  Issue Type: New Feature
>  Components: client, master, tablet
>Reporter: Todd Lipcon
>Assignee: Grant Henke
>Priority: Major
>  Labels: kudu-roadmap
>
> Many use cases have data where there is no "natural" primary key. For 
> example, a web log use case mostly cares about partitioning and not about 
> precise sorting by timestamp, and timestamps themselves are not necessarily 
> unique. Rather than forcing users to come up with their own surrogate primary 
> keys, Kudu should support some kind of "auto_increment" equivalent which 
> generates primary keys on insertion. Alternatively, Kudu could support tables 
> which are partitioned but not internally sorted.
> The advantages would be:
> - Kudu can pick primary keys on insertion to guarantee that there is no 
> compaction required on the table (eg always assign a new key higher than any 
> existing key in the local tablet). This can improve write throughput 
> substantially, especially compared to naive PK generation schemes that a user 
> might pick such as UUID, which would generate a uniform random-insert 
> workload (worst case for performance)
> - Make Kudu easier to use for such use cases (no extra client code necessary)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1902) Server does not start if NTP clock is unsynchronized

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1902.
---
   Resolution: Duplicate
Fix Version/s: n/a

> Server does not start if NTP clock is unsynchronized
> 
>
> Key: KUDU-1902
> URL: https://issues.apache.org/jira/browse/KUDU-1902
> Project: Kudu
>  Issue Type: Improvement
>  Components: master, tserver
>Affects Versions: 1.2.0
> Environment: Kudu1.2、CDH5.10.0、rhel7.2
>Reporter: YulongZ
>Priority: Major
> Fix For: n/a
>
> Attachments: kudu-master.INFO
>
>
> First,all the node in the kudu cluster are NTP synchronised ,but 
> ntp_gettime() and ntp_adjtime() return code error. All server can not start.
> [deployer@ZRR-POC419-39 ~]$ ntpq -p
>  remote   refid  st t when poll reach   delay   offset  jitter
> ==
> *ZRR-POC419-36   LOCAL(0) 9 u   85  256  3770.060   -0.209   0.041
> [deployer@ZRR-POC419-39 ~]$ ntpstat 
> synchronised to NTP server (10.162.158.36) at stratum 10 
>time correct to within 42 ms
>polling server every 256 s
> [deployer@ZRR-POC419-39 ~]$ ntptime 
> ntp_gettime() returns code 5 (ERROR)
>   time dc60206a.9f32acc0  Wed, Mar  1 2017  0:14:34.621, (.621867536),
>   maximum error 1600 us, estimated error 16 us, TAI offset 0
> ntp_adjtime() returns code 5 (ERROR)
>   modes 0x0 (),
>   offset 0.000 us, frequency 8.545 ppm, interval 1 s,
>   maximum error 1600 us, estimated error 16 us,
>   status 0x6041 (PLL,UNSYNC,NANO,MODE),
>   time constant 3, precision 0.001 us, tolerance 500 ppm,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1879) Support table without a primary key

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1879:
--
Labels: kudu-roadmap  (was: )

> Support table without a primary key
> ---
>
> Key: KUDU-1879
> URL: https://issues.apache.org/jira/browse/KUDU-1879
> Project: Kudu
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 1.2.0
>Reporter: Amos Bird
>Priority: Major
>  Labels: kudu-roadmap
>
> PK-less tables are more suitable for "bulk loading and processing" workflow. 
> It would be easier to implement new features for this kind of table, such as 
> rebalancing. We may even be able to add raw cfiles directly. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1563) Add support for INSERT/UPDATE/DELETE IGNORE

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1563:
--
Labels: backup kudu-roadmap  (was: backup kudu-roadmap newbie)

> Add support for INSERT/UPDATE/DELETE IGNORE
> ---
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: backup, kudu-roadmap
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1519) Kudu Website - Troubleshooting addition

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1519.
---
   Resolution: Invalid
Fix Version/s: n/a

> Kudu Website - Troubleshooting addition
> ---
>
> Key: KUDU-1519
> URL: https://issues.apache.org/jira/browse/KUDU-1519
> Project: Kudu
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.9.0
> Environment: CDH 5.7, Kudu Quickstart VM
>Reporter: Suzanne McIntosh
>Priority: Minor
>  Labels: documentation
> Fix For: n/a
>
>
> Add to the Kudu website (http://kudu.apache.org/docs/quickstart.html)  one 
> additional troubleshooting tip to turn off the VPN if ssh reports an error.
> Here's what happened:
> - I issued this command: ssh demo@quickstart.cloudera
> - I received this error message: "ssh: connect to host quickstart.cloudera 
> port 22: Permission denied"
> Turning off the VPN is a quick solution useful for verifying that the VM can 
> be accessed. An alternative is to avoid using ssh and log into the VM 
> directly. Finally, if ssh'ing into the VM is preferred but ssh reports the 
> above error, it is probably due to an IP address conflict that will need to 
> be resolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1519) Kudu Website - Troubleshooting addition

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864818#comment-16864818
 ] 

Grant Henke commented on KUDU-1519:
---

We don't use the quickstart vm anymore. 

> Kudu Website - Troubleshooting addition
> ---
>
> Key: KUDU-1519
> URL: https://issues.apache.org/jira/browse/KUDU-1519
> Project: Kudu
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.9.0
> Environment: CDH 5.7, Kudu Quickstart VM
>Reporter: Suzanne McIntosh
>Priority: Minor
>  Labels: documentation
>
> Add to the Kudu website (http://kudu.apache.org/docs/quickstart.html)  one 
> additional troubleshooting tip to turn off the VPN if ssh reports an error.
> Here's what happened:
> - I issued this command: ssh demo@quickstart.cloudera
> - I received this error message: "ssh: connect to host quickstart.cloudera 
> port 22: Permission denied"
> Turning off the VPN is a quick solution useful for verifying that the VM can 
> be accessed. An alternative is to avoid using ssh and log into the VM 
> directly. Finally, if ssh'ing into the VM is preferred but ssh reports the 
> above error, it is probably due to an IP address conflict that will need to 
> be resolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1511) Register Kudu tables created via the Java client with Impala

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864815#comment-16864815
 ] 

Grant Henke commented on KUDU-1511:
---

This is resolved in Kudu 1.10.0 vis the HMS synchronization feature. 

> Register Kudu tables created via the Java client with Impala
> 
>
> Key: KUDU-1511
> URL: https://issues.apache.org/jira/browse/KUDU-1511
> Project: Kudu
>  Issue Type: Improvement
>  Components: client, impala
>Affects Versions: 0.9.0
>Reporter: Andrew Stevenson
>Priority: Major
>
> Give the option to register any tables created and altered via the Java 
> clients with Impala rather than having to go to Impala for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1511) Register Kudu tables created via the Java client with Impala

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1511.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

> Register Kudu tables created via the Java client with Impala
> 
>
> Key: KUDU-1511
> URL: https://issues.apache.org/jira/browse/KUDU-1511
> Project: Kudu
>  Issue Type: Improvement
>  Components: client, impala
>Affects Versions: 0.9.0
>Reporter: Andrew Stevenson
>Priority: Major
> Fix For: 1.10.0
>
>
> Give the option to register any tables created and altered via the Java 
> clients with Impala rather than having to go to Impala for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1498) Add support to Java client for read-your-writes consistency

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864813#comment-16864813
 ] 

Grant Henke commented on KUDU-1498:
---

Is this done? I think [~hahao] may have wrapped up any remaining work. 

> Add support to Java client for read-your-writes consistency
> ---
>
> Key: KUDU-1498
> URL: https://issues.apache.org/jira/browse/KUDU-1498
> Project: Kudu
>  Issue Type: Sub-task
>  Components: client
>Reporter: Mike Percy
>Priority: Major
>
> The Java client could use a mode called "read your writes" consistency where 
> we ensure that we read whatever the leader has committed at the time of the 
> request.
> At the time of writing, the implementation requirements look like the 
> following:
> * Always scan from the leader
> * Specify that the leader must apply all operations from previous leaders 
> before processing the query
> In the C++ client, this can be achieved by specifying both of the LEADER_ONLY 
> and READ_AT_SNAPSHOT options, while not specifying a timestamp to use for the 
> snapshot when starting the scan.
> In the Java client API, we may want to simply expose a scan option called 
> "read your writes" or something similar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1464) Document that Xcode toolchain is necessary for Client on OS X

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1464.
---
   Resolution: Fixed
Fix Version/s: NA

> Document that Xcode toolchain is necessary for Client on OS X
> -
>
> Key: KUDU-1464
> URL: https://issues.apache.org/jira/browse/KUDU-1464
> Project: Kudu
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.8.0
>Reporter: Ryan Bosshart
>Priority: Trivial
> Fix For: NA
>
>
> OS X users who want to install _just_ the Kudu client also need to install 
> the Xcode toolchain. This is documented for the server, but not the client. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1458) truncate table support for kudu tables

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1458:
--
Labels: backup kudu-roadmap  (was: backup)

> truncate table support for kudu tables
> --
>
> Key: KUDU-1458
> URL: https://issues.apache.org/jira/browse/KUDU-1458
> Project: Kudu
>  Issue Type: New Feature
>  Components: impala, master
>Reporter: nick
>Priority: Major
>  Labels: backup, kudu-roadmap
>
> truncate will come in handy to quickly erase the table. current delete 
> statement takes long time when the record count is high.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1438) [java client] Upgrade to Netty 4

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1438:
--
Labels: kudu-roadmap  (was: )

> [java client] Upgrade to Netty 4
> 
>
> Key: KUDU-1438
> URL: https://issues.apache.org/jira/browse/KUDU-1438
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: Jean-Daniel Cryans
>Priority: Major
>  Labels: kudu-roadmap
>
> Netty 4 promises better performance for certain workloads, it was an effort 
> mostly led by Twitter. See their blog post about it: 
> https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead
> asynchbase has a pull request for this, so our work should be similar: 
> https://github.com/OpenTSDB/asynchbase/pull/116/commits



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2864) Support NOT predicates

2019-06-15 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2864:
-

 Summary: Support NOT predicates
 Key: KUDU-2864
 URL: https://issues.apache.org/jira/browse/KUDU-2864
 Project: Kudu
  Issue Type: Improvement
Reporter: Grant Henke


Support a NOT predicate that can be combined with other predicates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2864) Support NOT predicates

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2864:
--
Issue Type: Sub-task  (was: Improvement)
Parent: KUDU-1639

> Support NOT predicates
> --
>
> Key: KUDU-2864
> URL: https://issues.apache.org/jira/browse/KUDU-2864
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: Grant Henke
>Priority: Major
>
> Support a NOT predicate that can be combined with other predicates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1639) Improve predicate pushdown

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1639:
--
Labels: kudu-roadmap perf  (was: perf)

> Improve predicate pushdown
> --
>
> Key: KUDU-1639
> URL: https://issues.apache.org/jira/browse/KUDU-1639
> Project: Kudu
>  Issue Type: Improvement
>  Components: client, tablet
>Reporter: Dan Burkert
>Priority: Major
>  Labels: kudu-roadmap, perf
>
> Umbrella ticket for proposed improvements to predicates, scan optimization 
> based on predicates, and partition pruning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2863) Support OR predicates

2019-06-15 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2863:
-

 Summary: Support OR predicates
 Key: KUDU-2863
 URL: https://issues.apache.org/jira/browse/KUDU-2863
 Project: Kudu
  Issue Type: Sub-task
Reporter: Grant Henke


Support combining predicates with a OR predicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2516) Add NOT EQUAL predicate type

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2516:
--
Labels: kudu-roadmap  (was: )

> Add NOT EQUAL predicate type
> 
>
> Key: KUDU-2516
> URL: https://issues.apache.org/jira/browse/KUDU-2516
> Project: Kudu
>  Issue Type: Sub-task
>  Components: cfile, perf
>Affects Versions: 1.7.1
>Reporter: Mike Percy
>Priority: Major
>  Labels: kudu-roadmap
>
> Kudu currently does not have support for a NOT_EQUAL predicate type. This is 
> usually relevant when AND-ed together with other predicates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1386) NaN float and double values are not handled correctly

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864781#comment-16864781
 ] 

Grant Henke commented on KUDU-1386:
---

It looks like the linked gerrit is merged. Is this done [~wdberkeley]?

> NaN float and double values are not handled correctly
> -
>
> Key: KUDU-1386
> URL: https://issues.apache.org/jira/browse/KUDU-1386
> Project: Kudu
>  Issue Type: Sub-task
>  Components: tserver
>Reporter: Dan Burkert
>Assignee: Will Berkeley
>Priority: Minor
>
> {{TypeInfo::Compare}} and {{TypeInfo::Compare}} always return 
> 0 when one of the arguments is a {{NaN}} value.  This results in equality 
> predicates never filtering {{NaN}} values.  {{TypeInfo::Compare}} should be 
> changed so that it doesn't assume that the data type is totally ordered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1370) Implement bulk insert API

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1370:
--
Labels: kudu-roadmap  (was: )

> Implement bulk insert API
> -
>
> Key: KUDU-1370
> URL: https://issues.apache.org/jira/browse/KUDU-1370
> Project: Kudu
>  Issue Type: New Feature
>  Components: tablet
>Reporter: Mike Percy
>Priority: Major
>  Labels: kudu-roadmap
>
> Tracking JIRA to implement a bulk insert API. Consider also a bulk "upsert" 
> variant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1371) [blog] Overview of the BINARY column usage in Java

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1371.
---
   Resolution: Won't Fix
Fix Version/s: n/a

> [blog] Overview of the BINARY column usage in Java
> --
>
> Key: KUDU-1371
> URL: https://issues.apache.org/jira/browse/KUDU-1371
> Project: Kudu
>  Issue Type: Task
>Reporter: Jean-Daniel Cryans
>Priority: Major
> Fix For: n/a
>
>
> Talking to ffomenko on Slack, it'd be nice to have more examples on how to 
> use the binary column type in the Java client. One source of confusion is 
> that the ByteBuffer API grabs the remaining() bytes, which can be 
> counter-intuitive (hey I just filled that BB and I get 0 bytes back!). Also 
> giving examples on the read side would be good, again counter-intuitively you 
> can't just use array() to get a byte[].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1351) docs: Remove "All rights reserved" from Javadoc footer

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864779#comment-16864779
 ] 

Grant Henke commented on KUDU-1351:
---

[~mpercy] Is this still relevant? Not 100% sure what it's referencing. 

> docs: Remove "All rights reserved" from Javadoc footer
> --
>
> Key: KUDU-1351
> URL: https://issues.apache.org/jira/browse/KUDU-1351
> Project: Kudu
>  Issue Type: Bug
>  Components: build, documentation
>Reporter: Mike Percy
>Priority: Major
>
> Replace with the appropriate copyright message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1344) Make it easier to install Kudu from source

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1344:
--
Labels: kudu-roadmap  (was: )

> Make it easier to install Kudu from source
> --
>
> Key: KUDU-1344
> URL: https://issues.apache.org/jira/browse/KUDU-1344
> Project: Kudu
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.7.0
>Reporter: Adar Dembo
>Assignee: Greg Solovyev
>Priority: Major
>  Labels: kudu-roadmap
>
> At the moment, the best resource we have for getting Kudu from raw source 
> code to "up and running" is the documentation that describes how to build it. 
> Some gaps:
> # Docs that explain how to "install" it (i.e. where to copy the binaries, how 
> to integrate into init.d/systemd, where to place configuration files, etc.)
> # A "make install" target that doesn't just install C++ client files but 
> instead captures the above too.
> # Init.d scripts and basic configuration files so people don't have to write 
> their own.
> # Packaging code so it'd be possible to build system packages from source 
> code (i.e. "soup to nuts"), perhaps using 
> [fpm|https://github.com/jordansissel/fpm].
> Step 3 can be provided by opening up more of the internal Cloudera packaging 
> infrastructure for Kudu. Step 4 may obviate the need for said infrastructure 
> altogether, which isn't a bad idea because it's inscrutable, complicated and 
> very error-prone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1335) Migrate old Kudu design documentation into the source tree

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864777#comment-16864777
 ] 

Grant Henke commented on KUDU-1335:
---

Is this work done?

> Migrate old Kudu design documentation into the source tree
> --
>
> Key: KUDU-1335
> URL: https://issues.apache.org/jira/browse/KUDU-1335
> Project: Kudu
>  Issue Type: Task
>  Components: documentation
>Reporter: Adar Dembo
>Priority: Minor
>
> These should be sourced from Cloudera's internal wiki and Google docs, and 
> possibly from other places. We need to make sure they're scrubbed of any 
> confidential information.
> See [this gerrit|http://gerrit.cloudera.org:8080/#/c/2149/] for more 
> information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1331) Add scripts to build environment easier with Docker

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1331.
---
   Resolution: Fixed
 Assignee: Grant Henke  (was: Tsuyoshi Ozawa)
Fix Version/s: 1.9.0

This was resolved via docker work in Kudu 1.9.0. See here for details: 

[https://github.com/apache/kudu/tree/master/docker]

> Add scripts to build environment easier with Docker
> ---
>
> Key: KUDU-1331
> URL: https://issues.apache.org/jira/browse/KUDU-1331
> Project: Kudu
>  Issue Type: Improvement
>  Components: build
>Reporter: Tsuyoshi Ozawa
>Assignee: Grant Henke
>Priority: Major
> Fix For: 1.9.0
>
>
> Currently, potential contributors must setup build environments by 
> theirselves as describted in 
> http://getkudu.io/docs/installation.html#ubuntu_from_source
> Docker can help them setup build environments more easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1271) Column ordering constraint in Kudu

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1271:
--
Labels: kudu-roadmap  (was: )

> Column ordering constraint in Kudu
> --
>
> Key: KUDU-1271
> URL: https://issues.apache.org/jira/browse/KUDU-1271
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.5.0
> Environment: Cloudera CDH 5.4.x
>Reporter: Abhi Basu
>Priority: Major
>  Labels: kudu-roadmap
>
> I get this error when I am attempting to create a Kudu table as a select from 
> an existing Impala/Hive table. The last column of the table is rowid (int) 
> that is going to be used as primary key for this table. 
> Error: llegalArgumentException: Got out-of-order primary key column: Column 
> name: rowid, type:
> SQL example:
> CREATE TABLE newtable_kudu
> TBLPROPERTIES(
>  'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler',
>   'kudu.table_name' = 'newtable_kudu',
>   'kudu.master_addresses' = 'hostname:7051',
>   'kudu.key_columns' = 'rowid'
>  ) AS SELECT * FROM oldtable_impala;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1261) Support nested data types

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1261:
--
Labels: kudu-roadmap  (was: )

> Support nested data types
> -
>
> Key: KUDU-1261
> URL: https://issues.apache.org/jira/browse/KUDU-1261
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Jean-Daniel Cryans
>Priority: Major
>  Labels: kudu-roadmap
>
> AKA complex data types.
> This is a common ask. I'm creating this jira so that we can at least start 
> tracking how people want to use it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1258) Add table manipulation from kudu-client-tools and Web UI

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864774#comment-16864774
 ] 

Grant Henke commented on KUDU-1258:
---

Marking this as resolved as the command line tools now allow for a table to be 
dropped and altered. 
[https://kudu.apache.org/docs/command_line_tools_reference.html#table]

For more complex behavior the kudu-python client could be used. Or even the 
spark-shell with the kudu-spark integration.

> Add table manipulation from kudu-client-tools and Web UI
> 
>
> Key: KUDU-1258
> URL: https://issues.apache.org/jira/browse/KUDU-1258
> Project: Kudu
>  Issue Type: New Feature
>  Components: ops-tooling
>Reporter: Tsuyoshi Ozawa
>Priority: Major
>
> Currently, CREATE TABLE, DROP TABLE can be done from custom client or 
> impala-kudo. 
> I've heard that the backend of Hadoop's Timeline server can be Kudu, so it's 
> better to add cli-based tool or WebUI-based tool to manipulate tables. 
> I had lightning talk at Cloudera World Tokyo about this topic: 
> https://gist.github.com/oza/6fdd9cfd548b74526d32



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1258) Add table manipulation from kudu-client-tools and Web UI

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1258.
---
   Resolution: Fixed
Fix Version/s: NA

> Add table manipulation from kudu-client-tools and Web UI
> 
>
> Key: KUDU-1258
> URL: https://issues.apache.org/jira/browse/KUDU-1258
> Project: Kudu
>  Issue Type: New Feature
>  Components: ops-tooling
>Reporter: Tsuyoshi Ozawa
>Priority: Major
> Fix For: NA
>
>
> Currently, CREATE TABLE, DROP TABLE can be done from custom client or 
> impala-kudo. 
> I've heard that the backend of Hadoop's Timeline server can be Kudu, so it's 
> better to add cli-based tool or WebUI-based tool to manipulate tables. 
> I had lightning talk at Cloudera World Tokyo about this topic: 
> https://gist.github.com/oza/6fdd9cfd548b74526d32



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1255) Add API to PartialRow for timestamps

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864769#comment-16864769
 ] 

Grant Henke commented on KUDU-1255:
---

This was done in 
[https://github.com/apache/kudu/commit/5f9a2f523a58990af63b1b6eaef16dfc35eabddd]

> Add API to PartialRow for timestamps
> 
>
> Key: KUDU-1255
> URL: https://issues.apache.org/jira/browse/KUDU-1255
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: Mike Percy
>Priority: Major
>
> Currently if you want to set a TIMESTAMP column from the Java client you need 
> to use PartialRow.addLong(). It would be better to decouple this and add an 
> API for PartialRow.addTimestamp() for usability and to have more flexibility 
> to change the implementation in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-1255) Add API to PartialRow for timestamps

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KUDU-1255:
-

Assignee: Grant Henke

> Add API to PartialRow for timestamps
> 
>
> Key: KUDU-1255
> URL: https://issues.apache.org/jira/browse/KUDU-1255
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: Mike Percy
>Assignee: Grant Henke
>Priority: Major
> Fix For: 1.8.0
>
>
> Currently if you want to set a TIMESTAMP column from the Java client you need 
> to use PartialRow.addLong(). It would be better to decouple this and add an 
> API for PartialRow.addTimestamp() for usability and to have more flexibility 
> to change the implementation in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1255) Add API to PartialRow for timestamps

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1255.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

> Add API to PartialRow for timestamps
> 
>
> Key: KUDU-1255
> URL: https://issues.apache.org/jira/browse/KUDU-1255
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: Mike Percy
>Priority: Major
> Fix For: 1.8.0
>
>
> Currently if you want to set a TIMESTAMP column from the Java client you need 
> to use PartialRow.addLong(). It would be better to decouple this and add an 
> API for PartialRow.addTimestamp() for usability and to have more flexibility 
> to change the implementation in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1254) Missing docs on how to specify replica count when creating table via Impala

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1254.
---
Resolution: Fixed

This exists in the docs here: 
[https://kudu.apache.org/docs/kudu_impala_integration.html#kudu_impala_create_table]

> Missing docs on how to specify replica count when creating table via Impala
> ---
>
> Key: KUDU-1254
> URL: https://issues.apache.org/jira/browse/KUDU-1254
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: Public beta
>Reporter: Mike Percy
>Priority: Major
> Fix For: n/a
>
>
> There's currently no documented way to create a Kudu table via Impala and 
> specify the number of replicas each tablet will have.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1235) Add Get API

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1235:
--
Labels: kudu-roadmap  (was: )

> Add Get API
> ---
>
> Key: KUDU-1235
> URL: https://issues.apache.org/jira/browse/KUDU-1235
> Project: Kudu
>  Issue Type: New Feature
>  Components: client, tablet, tserver
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Major
>  Labels: kudu-roadmap
> Attachments: perf-get.svg, perf-scan-opt.svg, perf-scan.svg
>
>
> Get API is more user friendly and efficient if use just want primary key 
> lookup.
> I setup a cluster and test get/scan single row using ycsb, initial test shows 
> better performance for get.
> {noformat}
> kudu_workload:
> recordcount=100
> operationcount=100
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=false
> readproportion=1
> updateproportion=0
> scanproportion=0
> insertproportion=0
> requestdistribution=uniform
> use_get_api=false
> load:
> ./bin/ycsb load kudu -P workloads/kudu_workload -p sync_ops=false -p 
> pre_split_num_tablets=1 -p table_name=ycsb_wiki_example -p 
> masterQuorum='c3-kudu-tst-st01.bj:32600' -threads 100
> read test:
> ./bin/ycsb run kudu -P workloads/kudu_workload -p 
> masterQuorum='c3-kudu-tst-st01.bj:32600' -threads 100
> {noformat}
> Get API:
> [OVERALL], RunTime(ms), 21304.0
> [OVERALL], Throughput(ops/sec), 46939.54187007135
> [CLEANUP], Operations, 100.0
> [CLEANUP], AverageLatency(us), 423.57
> [CLEANUP], MinLatency(us), 24.0
> [CLEANUP], MaxLatency(us), 19327.0
> [CLEANUP], 95thPercentileLatency(us), 52.0
> [CLEANUP], 99thPercentileLatency(us), 18815.0
> [READ], Operations, 100.0
> [READ], AverageLatency(us), 2065.185152
> [READ], MinLatency(us), 134.0
> [READ], MaxLatency(us), 92159.0
> [READ], 95thPercentileLatency(us), 2391.0
> [READ], 99thPercentileLatency(us), 6359.0
> [READ], Return=0, 100
> Scan API:
> [OVERALL], RunTime(ms), 38259.0
> [OVERALL], Throughput(ops/sec), 26137.6408165399
> [CLEANUP], Operations, 100.0
> [CLEANUP], AverageLatency(us), 47.32
> [CLEANUP], MinLatency(us), 16.0
> [CLEANUP], MaxLatency(us), 1837.0
> [CLEANUP], 95thPercentileLatency(us), 41.0
> [CLEANUP], 99thPercentileLatency(us), 158.0
> [READ], Operations, 100.0
> [READ], AverageLatency(us), 3595.825249
> [READ], MinLatency(us), 139.0
> [READ], MaxLatency(us), 3139583.0
> [READ], 95thPercentileLatency(us), 3775.0
> [READ], 99thPercentileLatency(us), 7659.0
> [READ], Return=0, 100



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1220) Improve bulk loads from multiple sequential writers

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1220:
--
Labels: kudu-roadmap  (was: )

> Improve bulk loads from multiple sequential writers
> ---
>
> Key: KUDU-1220
> URL: https://issues.apache.org/jira/browse/KUDU-1220
> Project: Kudu
>  Issue Type: Improvement
>  Components: backup, perf
>Affects Versions: Public beta
>Reporter: Jean-Daniel Cryans
>Assignee: Todd Lipcon
>Priority: Major
>  Labels: kudu-roadmap
> Attachments: orderkeys.py, write-pattern.png
>
>
> We ran some experiments loading lineitem at scale factor 15k. The 10 nodes 
> cluster (1 master, 9 TS) is equipped with Intel P3700 SSDs, one per TS, 
> dedicated for the WALs. The table is hash-partitioned and set to have 10 
> tablets per TS.
> Our findings :
> - Reading the bloom filters puts a lot of contention on the block cache. This 
> isn't new, see KUDU-613, but it's now coming up when writing because the SSDs 
> are just really fast.
> - Kudu performs best when data is inserted in order, but with hash 
> partitioning we end up multiple clients writing simultaneously in different 
> key ranges in each tablet. This becomes a worst case scenario, we have to 
> compact (optimize) the row sets over and over again to put them in order. 
> Even if we were to delay this to the end of the bulk load, we're still taking 
> a hit because we have to look at more and more bloom filters to check if a 
> row currently exists or not.
> - In the case of an initial bulk load, we know we're not trying to overwrite 
> rows or update them, so all those checks are unnecessary.
> Some ideas for improvements:
> - Obviously, we need a better block cache.
> - When flushing, we could detect those disjoint set of rows and make sure 
> that maps to row sets that don't cover the gaps. For example, if the MRS has 
> a,b,c,x,y,z then flushing would give us two row sets eg a,b,c and x,y,z 
> instead of one. The danger here is generating too many row sets.
> - When reading, to have the row set interval tree be smart enough to not send 
> readers into the row set gaps. Again with the same example, let's say we're 
> looking for "m", normally we'd see a row set that's a-z so we'd have to check 
> its bloom filter, but if we could detect that it's actually a-c then x-z then 
> we'd save a check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1204) Make the quickstart demo independent of the demo VM

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1204.
---
   Resolution: Invalid
Fix Version/s: NA

We no longer use the quickstart vm.

> Make the quickstart demo independent of the demo VM
> ---
>
> Key: KUDU-1204
> URL: https://issues.apache.org/jira/browse/KUDU-1204
> Project: Kudu
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Eli Collins
>Priority: Minor
> Fix For: NA
>
>
> The quickstart demo should work from a CSD or developer install (ie 
> independent of the demo VM). Think the only two changes necessary are adding 
> a pointer to the Kudu Impala integration docs and fixing up the script to 
> work with the data you get from the SFO site as is (which does not have an ID 
> field).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-446) Integrate with Presto

2019-06-15 Thread Piotr Findeisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864759#comment-16864759
 ] 

Piotr Findeisen commented on KUDU-446:
--


The "Add Kudu connector" PR (https://github.com/prestodb/presto/pull/10388) was 
reviewed by [~kokosing], [~electrum] and myself ([~findepi]).
All of us contribute to Presto Community repo at 
https://github.com/prestosql/presto/

- Presto Kudu connector source 
https://github.com/prestosql/presto/tree/master/presto-kudu

If you have any issues or improvement ideas for the Kudu connector, or pull 
requests, you're more then welcome to file them:  
https://github.com/prestosql/presto/issues, 
https://github.com/prestosql/presto/pulls



> Integrate with Presto
> -
>
> Key: KUDU-446
> URL: https://issues.apache.org/jira/browse/KUDU-446
> Project: Kudu
>  Issue Type: Task
>  Components: client
>Affects Versions: M4.5
>Reporter: Jean-Daniel Cryans
>Priority: Trivial
>  Labels: kudu-roadmap
> Fix For: NA
>
>
> Presto is a relatively popular SQL tool in use in the community. It would be 
> nice to have some support for reading/writing data in Presto for those that 
> use it.
> This is likely to be implemented outside of the Kudu project itself, but this 
> JIRA can serve as a useful place for contributors to discuss the 
> implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1119) Consider supporting Impala tablespaces for Kudu tables

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1119.
---
   Resolution: Invalid
Fix Version/s: NA

> Consider supporting Impala tablespaces for Kudu tables
> --
>
> Key: KUDU-1119
> URL: https://issues.apache.org/jira/browse/KUDU-1119
> Project: Kudu
>  Issue Type: Improvement
>  Components: impala
>Affects Versions: Public beta
>Reporter: Misty Linville
>Assignee: Dan Burkert
>Priority: Major
> Fix For: NA
>
>
> Right now if you create a table using Impala, in a given Impala database 
> (my_database:my_table), Kudu strips out the database part and just calls the 
> table my_table. This creates a requirement for Kudu table names to be unique 
> across all Impala databases, and may be surprising behavior to seasoned 
> Impala users. I'm filing this ticket at [~tlipcon]'s request and may be 
> getting some of the details / limitations wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1119) Consider supporting Impala tablespaces for Kudu tables

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864745#comment-16864745
 ] 

Grant Henke commented on KUDU-1119:
---

This has been somewhat superseded by changes in Impala functionality and HMS 
sync functionality. Pre hms sync tables created by Impala will be in the format 
of `impala::database.tablename`. With HMS sync the table will be created as 
`database.tablename`.

> Consider supporting Impala tablespaces for Kudu tables
> --
>
> Key: KUDU-1119
> URL: https://issues.apache.org/jira/browse/KUDU-1119
> Project: Kudu
>  Issue Type: Improvement
>  Components: impala
>Affects Versions: Public beta
>Reporter: Misty Linville
>Assignee: Dan Burkert
>Priority: Major
>
> Right now if you create a table using Impala, in a given Impala database 
> (my_database:my_table), Kudu strips out the database part and just calls the 
> table my_table. This creates a requirement for Kudu table names to be unique 
> across all Impala databases, and may be surprising behavior to seasoned 
> Impala users. I'm filing this ticket at [~tlipcon]'s request and may be 
> getting some of the details / limitations wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1028) Be more graceful about clock unsynch errors

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864743#comment-16864743
 ] 

Grant Henke commented on KUDU-1028:
---

Should we leave this Jira open still? Should we close as wont do?

> Be more graceful about clock unsynch errors
> ---
>
> Key: KUDU-1028
> URL: https://issues.apache.org/jira/browse/KUDU-1028
> Project: Kudu
>  Issue Type: Improvement
>  Components: tserver
>Affects Versions: Feature Complete
>Reporter: David Alves
>Priority: Major
>
> We should likely refuse to execute the txns but not crash outright.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1007) Add CreateIfNotExists API for tables

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1007:
--
Labels: hackathon-feedback kudu-roadmap  (was: hackathon-feedback)

> Add CreateIfNotExists API for tables
> 
>
> Key: KUDU-1007
> URL: https://issues.apache.org/jira/browse/KUDU-1007
> Project: Kudu
>  Issue Type: New Feature
>  Components: client
>Affects Versions: Feature Complete
>Reporter: Mike Percy
>Priority: Major
>  Labels: hackathon-feedback, kudu-roadmap
>
> People want a non-racy API to create tables if needed without having to 
> handle java exceptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-982) nullable columns should support DEFAULT NULL

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KUDU-982:


Assignee: Grant Henke

> nullable columns should support DEFAULT NULL
> 
>
> Key: KUDU-982
> URL: https://issues.apache.org/jira/browse/KUDU-982
> Project: Kudu
>  Issue Type: Improvement
>  Components: api, client, master
>Affects Versions: Private Beta
>Reporter: Todd Lipcon
>Assignee: Grant Henke
>Priority: Major
>
> I don't think we have APIs which work for setting the default to NULL in 
> Alter/Create.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-968) Allow double-deletes of rows if requested.

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-968.
--
   Resolution: Duplicate
Fix Version/s: n/a

> Allow double-deletes of rows if requested.
> --
>
> Key: KUDU-968
> URL: https://issues.apache.org/jira/browse/KUDU-968
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 1.2.0
>Reporter: Martin Grund
>Priority: Major
> Fix For: n/a
>
>
> When issuing a DELETE stmt from Impala that performs a joint it is possible 
> that the generated set of rows is not distinct, which can lead to double 
> deletes of single rows. Right now, we iterate over the errors in the session 
> and ignore {{Status::IsNotFound}}. In the future, it would be nice to simply 
> pass a flag to {{KuduTable::NewDelete}} that does not propagate those errors 
> back to the client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1563) Add support for INSERT IGNORE

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1563:
--
Labels: backup kudu-roadmap newbie  (was: backup newbie)

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: backup, kudu-roadmap, newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1563) Add support for INSERT/UPDATE/DELETE IGNORE

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1563:
--
Summary: Add support for INSERT/UPDATE/DELETE IGNORE  (was: Add support for 
INSERT IGNORE)

> Add support for INSERT/UPDATE/DELETE IGNORE
> ---
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: backup, kudu-roadmap, newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-887) Test and stabilize DeleteTable

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864740#comment-16864740
 ] 

Grant Henke commented on KUDU-887:
--

Do you think there is more testing that needs to be done yet?

> Test and stabilize DeleteTable
> --
>
> Key: KUDU-887
> URL: https://issues.apache.org/jira/browse/KUDU-887
> Project: Kudu
>  Issue Type: Task
>  Components: master, test
>Affects Versions: Private Beta
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> DeleteTable isn't very well tested right now, especially in the replicated 
> case. We should add some stress tests in this area.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-886) Cluster load balancing

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864737#comment-16864737
 ] 

Grant Henke commented on KUDU-886:
--

I will link in a related Jira for range balancing (KUDU-2823). Is there a Jira 
tracking leader balancing? 

Maybe this should be closed because a large chunk of work is done and new jiras 
can track follow on work. What do you think [~wdberkeley]?

> Cluster load balancing
> --
>
> Key: KUDU-886
> URL: https://issues.apache.org/jira/browse/KUDU-886
> Project: Kudu
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 1.2.0
>Reporter: Todd Lipcon
>Assignee: Will Berkeley
>Priority: Major
>  Labels: kudu-roadmap
>
> We should add some load balancing support for GA:
> - move leaders to evenly spread RPC load.
> - eventually move tablets to even out disk space or load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-886) Cluster load balancing

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-886:
-
Labels: kudu-roadmap  (was: )

> Cluster load balancing
> --
>
> Key: KUDU-886
> URL: https://issues.apache.org/jira/browse/KUDU-886
> Project: Kudu
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 1.2.0
>Reporter: Todd Lipcon
>Assignee: Will Berkeley
>Priority: Major
>  Labels: kudu-roadmap
>
> We should add some load balancing support for GA:
> - move leaders to evenly spread RPC load.
> - eventually move tablets to even out disk space or load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-749) Improve performance for zipfian update

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-749:
-
Labels: kudu-roadmap  (was: )

> Improve performance for zipfian update
> --
>
> Key: KUDU-749
> URL: https://issues.apache.org/jira/browse/KUDU-749
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf, tablet
>Affects Versions: Private Beta
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: kudu-roadmap
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> A zipfian 50/50 update/read workload on YCSB gets slower and slower until 
> it's pretty intolerable (random reads taking 100+ms of CPU). It seems like 
> all the CPU is spent in DMSIterator::PrepareBatch. We're probably doing 
> something dumb here - let's look for some low hanging fruit to fix this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-705) YCSB OOMEs when running with lots of threads

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864735#comment-16864735
 ] 

Grant Henke commented on KUDU-705:
--

Do you think this issue still exists [~tlipcon]?

> YCSB OOMEs when running with lots of threads
> 
>
> Key: KUDU-705
> URL: https://issues.apache.org/jira/browse/KUDU-705
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: Private Beta
>Reporter: Todd Lipcon
>Priority: Major
>
> I tried running a YCSB workload against a 100-node cluster with 64 threads. 
> It OOMEd pretty quickly with "GC overhead limit exceeded". Haven't 
> investigated where the memory usage went, yet, but we should probably have 
> some docs on memory requirements and buffering, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-646) Check the scan type before applying the selection vector

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864734#comment-16864734
 ] 

Grant Henke commented on KUDU-646:
--

[~adar] is this still relevant after all your changes and optimizations?

> Check the scan type before applying the selection vector
> 
>
> Key: KUDU-646
> URL: https://issues.apache.org/jira/browse/KUDU-646
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf, tablet
>Affects Versions: M5
>Reporter: Andrew Wang
>Priority: Trivial
>
> As pointed out during code review of the MergeIterator, after merging the 
> selection vector is all true (all rows selected).
> This means during a scan, if we are using the MergeIterator, we can skip 
> checking the resulting SelectionVector. This will save us some CPU.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-635) Implement clean shutdown

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-635:
-
Labels: kudu-roadmap  (was: )

> Implement clean shutdown
> 
>
> Key: KUDU-635
> URL: https://issues.apache.org/jira/browse/KUDU-635
> Project: Kudu
>  Issue Type: Bug
>  Components: recovery
>Affects Versions: M5
>Reporter: Adar Dembo
>Priority: Trivial
>  Labels: kudu-roadmap
>
> Today, a Kudu node's "shutdown" routine is merely exiting abruptly upon 
> receipt of a signal, be it SIGINT, SIGTERM, or (obviously) SIGKILL. Any 
> in-memory state (like MRS or DRS) is lost, and on startup, the WAL must be 
> replayed as part of bootstrap.
> It's not hard to conceive of a cleaner shutdown routine.It'd probably be 
> issued via RPC, and it would perform the following steps:
> # Quiesce the server so that future RPCs are dropped.
> # Abdicate quorum leadership.
> # Flush every MRS/DRS.
> # GC every WAL.
> # Exit gracefully (i.e. run through the TS/Master destructor).
> Kudu is meant to recover in the event of a crash, so why bother with a clean 
> shutdown? Why not make every shutdown an "abrupt" one? Well, a clean shutdown 
> would take more time to run, but would also guarantee faster startup because 
> there'd be less work to do during bootstrap. With a clean shutdown, 
> time("work at shutdown") < time("work at startup"), and that would also help 
> making Kudu rolling restarts more efficient. A similar tack was recently 
> taken in HDFS for the same reason.
> The easy part (step #5 from the above list) was recently implemented 
> [here|http://gerrit.sjc.cloudera.com:8080/#/c/6080/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-560) Consensus/WAL/Transactions Optimizations and tests

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-560:
-
Labels: kudu-roadmap  (was: )

> Consensus/WAL/Transactions Optimizations and tests
> --
>
> Key: KUDU-560
> URL: https://issues.apache.org/jira/browse/KUDU-560
> Project: Kudu
>  Issue Type: Improvement
>  Components: consensus, log
>Affects Versions: M4.5
>Reporter: David Alves
>Priority: Major
>  Labels: kudu-roadmap
>
> This is an umbrella jira for several optimization and tests that we should 
> add in the near future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-511) Apparent TSAN race in rpc::ConnectionId destructor

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864731#comment-16864731
 ] 

Grant Henke commented on KUDU-511:
--

Is this still an issue?

> Apparent TSAN race in rpc::ConnectionId destructor
> --
>
> Key: KUDU-511
> URL: https://issues.apache.org/jira/browse/KUDU-511
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: M4
>Reporter: Mike Percy
>Priority: Major
>
> Seen in a local unit test failure
> {noformat}
> [ RUN  ] TestRpc.TestCallToBadServer
> I0916 20:08:22.244844 17247 rpc-test.cc:98] Status: Network error: RPC 
> connection to 0.0.0.0:0 failed: connect: Connection refused (error 111)
> I0916 20:08:22.245394 17247 rpc-test.cc:98] Status: Network error: RPC 
> connection to 0.0.0.0:0 failed: connect: Connection refused (error 111)
> I0916 20:08:22.245895 17247 rpc-test.cc:98] Status: Network error: RPC 
> connection to 0.0.0.0:0 failed: connect: Connection refused (error 111)
> I0916 20:08:22.246326 17247 rpc-test.cc:98] Status: Network error: RPC 
> connection to 0.0.0.0:0 failed: connect: Connection refused (error 111)
> I0916 20:08:22.246748 17247 rpc-test.cc:98] Status: Network error: RPC 
> connection to 0.0.0.0:0 failed: connect: Connection refused (error 111)
> ==
> WARNING: ThreadSanitizer: data race (pid=17247)
>   Write of size 1 at 0x7d08e33a by main thread:
> #0 operator delete(void*) 
> /home/mpercy/src/kudu/thirdparty/llvm-3.4.2.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:592
>  (rpc-test+0x0004b0ab)
> #1  :0 (libstdc++.so.6+0x000ba4de)
> #2 kudu::rpc::ConnectionId::~ConnectionId() 
> /home/mpercy/src/kudu/src/kudu/rpc/outbound_call.h:85 
> (libkrpc.so+0x0008b9ab)
> #3 kudu::rpc::Proxy::~Proxy() 
> /home/mpercy/src/kudu/src/kudu/rpc/proxy.cc:55 (libkrpc.so+0x000a9fcb)
> #4 kudu::rpc::TestRpc_TestCallToBadServer_Test::TestBody() 
> /home/mpercy/src/kudu/src/kudu/rpc/rpc-test.cc:101 (rpc-test+0x000b6073)
> #5 void 
> testing::internal::HandleSehExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) :0 
> (libgtest.so+0x00062819)
> #6 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 
> (libc.so.6+0x00021ec4)
>   Previous read of size 1 at 0x7d08e33a by thread T2: 
> #0 memcmp 
> /home/mpercy/src/kudu/thirdparty/llvm-3.4.2.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:634
>  (rpc-test+0x0004bd1e)
> #1 std::char_traits::compare(char const*, char const*, unsigned 
> long) 
> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/bits/char_traits.h:255
>  (rpc-test+0x000c06c9)
> #2 
> _ZSteqIcEN9__gnu_cxx11__enable_ifIXsr9__is_charIT_EE7__valueEbE6__typeERKSbIS2_St11char_traitsIS2_ESaIS2_EESA_
>  
> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/bits/basic_string.h:2496
>  (rpc-test+0x000c0681)
> #3 kudu::rpc::UserCredentials::Equals(kudu::rpc::UserCredentials const&) 
> const /home/mpercy/src/kudu/src/kudu/rpc/outbound_call.cc:308:11 
> (libkrpc.so+0x0008af58)
> #4 kudu::rpc::ConnectionId::Equals(kudu::rpc::ConnectionId const&) const 
> /home/mpercy/src/kudu/src/kudu/rpc/outbound_call.cc:369 
> (libkrpc.so+0x0008b35a)
> #5 kudu::rpc::ConnectionIdEqual::operator()(kudu::rpc::ConnectionId 
> const&, kudu::rpc::ConnectionId const&) const 
> /home/mpercy/src/kudu/src/kudu/rpc/outbound_call.cc:377 
> (libkrpc.so+0x0008b3d0)
> #6 std::tr1::__detail::_Hash_code_base std::pair 
> >, std::_Select1st scoped_refptr > >, kudu::rpc::ConnectionIdEqual, 
> kudu::rpc::ConnectionIdHash, std::tr1::__detail::_Mod_range_hashing, 
> std::tr1::__detail::_Default_ranged_hash, 
> false>::_M_compare(kudu::rpc::ConnectionId const&, unsigned long, 
> std::tr1::__detail::_Hash_node scoped_refptr >, false>*) const 
> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/tr1/hashtable_policy.h:687
>  (libkrpc.so+0x000b6ac9)
> #7 std::tr1::_Hashtable std::pair 
> >, std::allocator scoped_refptr > >, 
> std::_Select1st scoped_refptr > >, kudu::rpc::ConnectionIdEqual, 
> kudu::rpc::ConnectionIdHash, std::tr1::__detail::_Mod_range_hashing, 
> std::tr1::__detail::_Default_ranged_hash, 
> std::tr1::__detail::_Prime_rehash_policy, false, false, 
> true>::_M_find_node(std::tr1::__detail::_Hash_node  const, scoped_refptr >, false>*, 
> kudu::rpc::ConnectionId const&, unsigned long) const 
> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/tr1/hashtable.h:830
>  (libkrpc.so+0x000b5e80)
> #8 std::tr1::_Hashtable std::pair 
> >, std::allocator scoped_refptr > >, 
> std::_Select1st scoped_refptr > >, kudu::rpc::ConnectionIdEqual, 
> kudu::rpc::ConnectionIdHash, 

[jira] [Commented] (KUDU-446) Integrate with Presto

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864729#comment-16864729
 ] 

Grant Henke commented on KUDU-446:
--

This was done in Presto here: 
[https://github.com/prestodb/presto/tree/master/presto-kudu]

> Integrate with Presto
> -
>
> Key: KUDU-446
> URL: https://issues.apache.org/jira/browse/KUDU-446
> Project: Kudu
>  Issue Type: Task
>  Components: client
>Affects Versions: M4.5
>Reporter: Jean-Daniel Cryans
>Priority: Trivial
>  Labels: kudu-roadmap
>
> Presto is a relatively popular SQL tool in use in the community. It would be 
> nice to have some support for reading/writing data in Presto for those that 
> use it.
> This is likely to be implemented outside of the Kudu project itself, but this 
> JIRA can serve as a useful place for contributors to discuss the 
> implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-446) Integrate with Presto

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-446.
--
   Resolution: Fixed
Fix Version/s: NA

> Integrate with Presto
> -
>
> Key: KUDU-446
> URL: https://issues.apache.org/jira/browse/KUDU-446
> Project: Kudu
>  Issue Type: Task
>  Components: client
>Affects Versions: M4.5
>Reporter: Jean-Daniel Cryans
>Priority: Trivial
>  Labels: kudu-roadmap
> Fix For: NA
>
>
> Presto is a relatively popular SQL tool in use in the community. It would be 
> nice to have some support for reading/writing data in Presto for those that 
> use it.
> This is likely to be implemented outside of the Kudu project itself, but this 
> JIRA can serve as a useful place for contributors to discuss the 
> implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1702) Document/Implement read-your-writes for Impala/Spark etc.

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864728#comment-16864728
 ] 

Grant Henke commented on KUDU-1702:
---

[~hahao] is this done?

> Document/Implement read-your-writes for Impala/Spark etc.
> -
>
> Key: KUDU-1702
> URL: https://issues.apache.org/jira/browse/KUDU-1702
> Project: Kudu
>  Issue Type: Sub-task
>  Components: client, tablet, tserver
>Affects Versions: 1.1.0
>Reporter: David Alves
>Assignee: David Alves
>Priority: Major
>
> Engines like Impala/Spark use many independent client instances, so we should 
> provide a way to have read-your-writes across many independent client 
> instances, which translates to providing a way to get linearizable behavior. 
> At first this can be done using the APIs that are already available. For 
> instance if the objective is to be sure to have the results of a write in a a 
> following scan, the following steps can be taken:
> - After a write the engine should collect the last observed timestamps from 
> kudu clients
> - The engine's coordinator then takes the max of those timestamps, adds 1 and 
> uses that as a snapshot scan timestamp.
> One important pre-requisite of the behavior above is that scans be done in 
> READ_AT_SNAPSHOT mode. Also the steps above currently don't actually 
> guarantee the expected behavior, but should as the currently anomalies are 
> taken care of (as part of KUDU-430).
> In the immediate future we'll add APIs to the Kudu client so as to make the 
> inner workings of getting this behavior oblivious to the engine. The steps 
> will still be the same, i.e. timestamps or timestamp tokens will still be 
> passed around, but the kudu client will encapsulate the choice of the 
> timestamp for the scan.
> Later we will add a way to obtain this behavior without timestamp 
> propagation, either by doing a write-side commit-wait, where clients wait out 
> the clock error after/during the last write thus making sure any future 
> operation will have a higher timestamp; or by making read-side commit wait, 
> where we provide an api on the kudu client for the engine to perform a 
> similar call before the scan call to obtain a scan timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-442) Kudu SerDe for hive

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KUDU-442:


Assignee: Grant Henke  (was: Clemens Valiente)

> Kudu SerDe for hive
> ---
>
> Key: KUDU-442
> URL: https://issues.apache.org/jira/browse/KUDU-442
> Project: Kudu
>  Issue Type: New Feature
>  Components: integration
>Affects Versions: 1.2.0
>Reporter: Todd Lipcon
>Assignee: Grant Henke
>Priority: Minor
>  Labels: kudu-roadmap
>
> Though Impala is the horse we are betting on, we should probably build a Hive 
> SerDe as well, since Hive still supports many features that Impala doesn't. 
> This is potentially a place to leverage community contribution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-355) Create tests for the tools

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-355.
--
   Resolution: Fixed
Fix Version/s: n/a

> Create tests for the tools
> --
>
> Key: KUDU-355
> URL: https://issues.apache.org/jira/browse/KUDU-355
> Project: Kudu
>  Issue Type: Task
>  Components: ops-tooling
>Affects Versions: M4
>Reporter: David Alves
>Priority: Minor
> Fix For: n/a
>
>
> We frequently break the tools because we have no tests. For instance log-dump 
> is now broken because somethings schema related changed and no one used it 
> for a while.
> We should add some minimal tests to make sure that tools still work when 
> making other changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-364) UBSAN error in stopwatch

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864727#comment-16864727
 ] 

Grant Henke commented on KUDU-364:
--

Is this fixed now?

> UBSAN error in stopwatch
> 
>
> Key: KUDU-364
> URL: https://issues.apache.org/jira/browse/KUDU-364
> Project: Kudu
>  Issue Type: Bug
>  Components: util
>Affects Versions: M4.5
>Reporter: Todd Lipcon
>Priority: Minor
>
> Got the following error in a run of 
> TabletServerTest.TestCreateTablet_TabletExists:
> /var/lib/jenkins/workspace/kudu-test/BUILD_TYPE/ASAN/label/centos6-kudu/src/util/stopwatch.h:192:43:
>  runtime error: signed integer overflow: 18446744073 * 10 cannot be 
> represented in type 'long'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-355) Create tests for the tools

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864726#comment-16864726
 ] 

Grant Henke commented on KUDU-355:
--

Resolving this as the tools have lots of tests now and any missing tests can be 
tracked by separate jiras.

> Create tests for the tools
> --
>
> Key: KUDU-355
> URL: https://issues.apache.org/jira/browse/KUDU-355
> Project: Kudu
>  Issue Type: Task
>  Components: ops-tooling
>Affects Versions: M4
>Reporter: David Alves
>Priority: Minor
>
> We frequently break the tools because we have no tests. For instance log-dump 
> is now broken because somethings schema related changed and no one used it 
> for a while.
> We should add some minimal tests to make sure that tools still work when 
> making other changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2835) Add custom id in RpcHeader

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864725#comment-16864725
 ] 

Grant Henke commented on KUDU-2835:
---

Another related Jira for this is KUDU-351. Adding a session descriptor could 
allow for differentiation between work even when the client is shared or 
reused. 

> Add custom id in RpcHeader
> --
>
> Key: KUDU-2835
> URL: https://issues.apache.org/jira/browse/KUDU-2835
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Xu Yao
>Assignee: Xu Yao
>Priority: Major
>
> In our production environment, there are many distributed jobs that send 
> request to Kudu by KuduClient. However, if there are some RPC timeouts on the 
> server, it is difficult to find the affected KuduClient based on the 
> information of rpcz. Because there may be many KuduClients on each host.
> So we want to add extra information to RpcHeader to find out the problematic 
> distributed tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-245) Add ability to limit size of threadpool queue

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864720#comment-16864720
 ] 

Grant Henke commented on KUDU-245:
--

Is this Jira done? Did the follow on jiras get filed?

> Add ability to limit size of threadpool queue
> -
>
> Key: KUDU-245
> URL: https://issues.apache.org/jira/browse/KUDU-245
> Project: Kudu
>  Issue Type: Improvement
>  Components: supportability
>Affects Versions: M3
>Reporter: Mike Percy
>Assignee: Mike Percy
>Priority: Major
> Fix For: Backlog
>
>
> Currently, the threadpool queue has no limit. This can result in a lot of 
> wasted work and lack of backpressure for RPC services, including consensus, 
> that simply queue up work when it may time out later.
> We should parameterize the threadpool queue and fail to Submit to the 
> threadpool if the queue is beyond the specified threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-81) rpc-test TestConnectionKeepalive failure

2019-06-15 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864711#comment-16864711
 ] 

Grant Henke commented on KUDU-81:
-

Should this be closed [~tlipcon]?

> rpc-test TestConnectionKeepalive failure
> 
>
> Key: KUDU-81
> URL: https://issues.apache.org/jira/browse/KUDU-81
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc, test
>Affects Versions: M4
>Reporter: Todd Lipcon
>Priority: Trivial
>  Labels: flaky
>
> Saw this fail once:
> {code}
> /var/lib/jenkins/workspace/kudu-test/BUILD_TYPE/LEAKCHECK/label/centos6-kudu/src/rpc/rpc-test.cc:155:
>  Failure
> Value of: metrics.num_server_connections_
>   Actual: 1
> Expected: 0
> Server should have 0 server connections
> {code}
> Probably just a timing issue in the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2862) Write a docker based quickstart

2019-06-15 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2862:
-

 Summary: Write a docker based quickstart 
 Key: KUDU-2862
 URL: https://issues.apache.org/jira/browse/KUDU-2862
 Project: Kudu
  Issue Type: Task
  Components: documentation
Reporter: Grant Henke


We removed the VM based quickstart image a while ago, but the 
functionality/guide was useful for allowing users to experiment with Kudu 
quickly. We should write a new quickstart guide using the Docker images now 
that they are published.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2861) Docker examples with external access

2019-06-15 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2861:
--
Description: Write some docker examples that use a custom network and the 
`advertised_addresses` configuration to allow for access from outside of the 
cluster. These examples should also discuss configuring and using volumes.  
(was: Write some docker examples that use a custom network and the 
`advertised_addresses` configuration to allow for access from outside of the 
cluster. )

> Docker examples with external access
> 
>
> Key: KUDU-2861
> URL: https://issues.apache.org/jira/browse/KUDU-2861
> Project: Kudu
>  Issue Type: Task
>  Components: docker
>Reporter: Grant Henke
>Priority: Major
>  Labels: docker
>
> Write some docker examples that use a custom network and the 
> `advertised_addresses` configuration to allow for access from outside of the 
> cluster. These examples should also discuss configuring and using volumes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2861) Docker examples with external access

2019-06-15 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2861:
-

 Summary: Docker examples with external access
 Key: KUDU-2861
 URL: https://issues.apache.org/jira/browse/KUDU-2861
 Project: Kudu
  Issue Type: Task
  Components: docker
Reporter: Grant Henke


Write some docker examples that use a custom network and the 
`advertised_addresses` configuration to allow for access from outside of the 
cluster. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2860) Sign docker images

2019-06-15 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2860:
-

 Summary: Sign docker images
 Key: KUDU-2860
 URL: https://issues.apache.org/jira/browse/KUDU-2860
 Project: Kudu
  Issue Type: Improvement
  Components: docker
Reporter: Grant Henke
Assignee: Grant Henke


We should sign the Apache docker images following the instructions here: 
[https://docs.docker.com/ee/dtr/user/manage-images/sign-images/]

 

Ideally this would be handled by the build script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2859) Built in NTP client

2019-06-15 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2859:
-

 Summary: Built in NTP client
 Key: KUDU-2859
 URL: https://issues.apache.org/jira/browse/KUDU-2859
 Project: Kudu
  Issue Type: New Feature
  Components: server
Reporter: Grant Henke


Requiring ntpd or chrony to be setup correctly and configured correctly makes 
deployments more complicated. 

 

We should include our own built-in stripped-down implementation of NTP without 
any reliance on kernel features. This should hopefully make it easier for users 
to configure NTP even if they don't have root, and also can maintain better 
clock error than the system implementation, since we can prioritize low error 
bounds rather than low jitter.

 

Additionally this simplifies docker deployments by not requiring CAP_SYS_TIME 
or privileged containers and ensuring host machines have ntpd setup correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >