[jira] [Created] (KUDU-1657) read-only FsManager::Open on active tablet can crash

2016-09-27 Thread Dan Burkert (JIRA)
Dan Burkert created KUDU-1657:
-

 Summary: read-only FsManager::Open on active tablet can crash
 Key: KUDU-1657
 URL: https://issues.apache.org/jira/browse/KUDU-1657
 Project: Kudu
  Issue Type: Bug
Reporter: Dan Burkert
Assignee: Dan Burkert


alter_table-randomized-test.cc is currently flaky due to a crash in the 
LogVerifier that happens because FsManager is not robust to running in 
read-only mode against an actively writing tablet. The root of the issue is a 
stale data container length that is used after reading new metadata. The 
failure results in log messages such as:

{code}
F0927 19:37:39.883033 22107 log_block_manager.cc:535] Found malformed block 
record in data file: 
/tmp/kudutest-4348/insert-verify-itest.InsertVerifyITest.TestInsertAndVerify.1475030222707874-17327/minicluster-data/ts-0/data/e4ade118175d48cabd2085014a6d762e.data
Record: block_id {
  id: 1525
}
op_type: CREATE
timestamp_us: 1475030259882913
offset: 5840896
length: 279030

Data file size: 6119892
*** Check failure stack trace: ***
@ 0x7f86ce57bf5d  google::LogMessage::Fail() at ??:0
@ 0x7f86ce57de5d  google::LogMessage::SendToLog() at ??:0
@ 0x7f86ce57ba99  google::LogMessage::Flush() at ??:0
@ 0x7f86ce57e8ff  google::LogMessageFatal::~LogMessageFatal() at ??:0
@ 0x7f86cfe4e32b  
kudu::fs::internal::LogBlockContainer::CheckBlockRecord() at ??:0
@ 0x7f86cfe4dc8d  
kudu::fs::internal::LogBlockContainer::ReadContainerRecords() at ??:0
@ 0x7f86cfe5731a  kudu::fs::LogBlockManager::OpenRootPath() at ??:0
@ 0x7f86cfe69023  kudu::internal::RunnableAdapter<>::Run() at ??:0
@ 0x7f86cfe66959  kudu::internal::InvokeHelper<>::MakeItSo() at ??:0
@ 0x7f86cfe63a77  kudu::internal::Invoker<>::Run() at ??:0
@ 0x7f86d598b542  kudu::Callback<>::Run() at ??:0
@ 0x7f86d598fe61  boost::_mfi::cmf0<>::operator()() at ??:0
@ 0x7f86d598f93e  boost::_bi::list1<>::operator()<>() at ??:0
@ 0x7f86d598f05d  boost::_bi::bind_t<>::operator()() at ??:0
@ 0x7f86d598e860  
boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
@ 0x7f86d1296732  boost::function0<>::operator()() at ??:0
@ 0x7f86cf402124  kudu::FunctionRunnable::Run() at ??:0
@ 0x7f86cf401556  kudu::ThreadPool::DispatchThread() at ??:0
@ 0x7f86cf405824  boost::_mfi::mf1<>::operator()() at ??:0
@ 0x7f86cf40542b  boost::_bi::list2<>::operator()<>() at ??:0
@ 0x7f86cf404ecd  boost::_bi::bind_t<>::operator()() at ??:0
@ 0x7f86cf4047fe  
boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
@ 0x7f86d1296732  boost::function0<>::operator()() at ??:0
@ 0x7f86cf3f8717  kudu::Thread::SuperviseThread() at ??:0
@   0x3ae0e079d1  (unknown) at ??:0
@   0x3ae0ae88fd  (unknown) at ??:0
@  (nil)  (unknown)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1656) Scanner timeouts aren't retried when waiting on a transaction

2016-09-27 Thread Jean-Daniel Cryans (JIRA)
Jean-Daniel Cryans created KUDU-1656:


 Summary: Scanner timeouts aren't retried when waiting on a 
transaction
 Key: KUDU-1656
 URL: https://issues.apache.org/jira/browse/KUDU-1656
 Project: Kudu
  Issue Type: Bug
  Components: tserver
Reporter: Jean-Daniel Cryans


I recently changed ITClient to use READ_AT_SNAPSHOT scanners and we've been 
seeing errors like this:

{noformat}
19:56:29.459 [WARN - New I/O worker #169] (AsyncKuduScanner.java:407) Can not 
open scanner
org.apache.kudu.client.NonRecoverableException: could not wait for desired 
snapshot timestamp to be consistent: Timed out waiting for all transactions 
with ts < P: 1475006188645381 usec, L: 0 to commit
at 
org.apache.kudu.client.TabletClient.dispatchTSErrorOrReturnException(TabletClient.java:548)
at org.apache.kudu.client.TabletClient.decode(TabletClient.java:482)
at org.apache.kudu.client.TabletClient.decode(TabletClient.java:83)
{noformat}

Since this comes back as a TimedOut AppStatus, neither clients are retrying the 
error which doesn't seem to be the expected behavior on the server-side: 
https://github.com/cloudera/kudu/blob/be719edc3581802e094c3af6a88d67acba44ba71/src/kudu/tserver/tablet_service.cc#L1764

One one hand it seems weird to rely on the user to retry only certain timeouts, 
OTOH maybe it shouldn't be sent as a timeout? But I'm not sure what it should 
be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1655) Update docs for ASF maven repository coordinates

2016-09-27 Thread Todd Lipcon (JIRA)
Todd Lipcon created KUDU-1655:
-

 Summary: Update docs for ASF maven repository coordinates
 Key: KUDU-1655
 URL: https://issues.apache.org/jira/browse/KUDU-1655
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Todd Lipcon


The docs are still pointing to Cloudera maven repos, but we now publish to the 
ASF repo.

We should also update the "spark-shell" example to use the "--packages" 
argument since Kudu's available in Maven.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1363) Add IN-list predicate type

2016-09-27 Thread Dan Burkert (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert updated KUDU-1363:
--
Status: In Review  (was: In Progress)

> Add IN-list predicate type
> --
>
> Key: KUDU-1363
> URL: https://issues.apache.org/jira/browse/KUDU-1363
> Project: Kudu
>  Issue Type: Sub-task
>  Components: client, perf, tablet
>Reporter: Chris George
>Assignee: Dan Burkert
>
> Currently adding multiple column range predicates for the same column does 
> essentially an AND between the two predicates which will cause no results to 
> be returned. 
> This would greatly increase performance were I can complete in one scan what 
> would otherwise take two.
> As an example using the java api:
> ColumnRangePredicate columnRangePredicateColumnNameA = new 
> ColumnRangePredicate(new ColumnSchema.ColumnSchemaBuilder("column_name", 
> Type.STRING).build());
> columnRangePredicateColumnNameA.setLowerBound("A");
> columnRangePredicateColumnNameA.setUpperBound("A");
> ColumnRangePredicate columnRangePredicateColumnNameB = new 
> ColumnRangePredicate(new ColumnSchema.ColumnSchemaBuilder("column_name", 
> Type.STRING).build());
> columnRangePredicateColumnNameB.setLowerBound("B");
> columnRangePredicateColumnNameB.setUpperBound("B");
> which would be equivalent:
> select * from some_table where column_name="A" or column_name="B"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KUDU-1642) Add IS NULL predicate type

2016-09-27 Thread Alexey Serbin (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526919#comment-15526919
 ] 

Alexey Serbin edited comment on KUDU-1642 at 9/27/16 6:08 PM:
--

Yes, psql supports NULL in IN-list predicates.  At least with PostgreSQL 9.3.  
Probably, that's done to support sub-selects like {{SELECT * FROM x WHERE 
field_x IN (SELECT field_y FROM y}}; besides, they have stored procedures and 
pgsql, so it's crucial to provide syntax consistency there.  By itself, {{WHERE 
field_x IN (NULL)}} should result in empty resultset by definition, since it's 
the same as {{WHERE field_x = NULL}}.

I'm not sure whether supporting that brings any value to the Kudu project as is 
since sub-selects are not supported in Kudu now, AFAIK. It's more about syntax 
consistency.

{noformat}
postgres@ubuntu-14:~$ psql
psql (9.3.13)
Type "help" for help.

postgres=# INSERT INTO x VALUES (0, 1);
INSERT 0 1
postgres=# INSERT INTO x VALUES (1, NULL);
INSERT 0 1
postgres=# SELECT * FROM x;
 a | b 
---+---
 0 | 1
 1 |  
(2 rows)

postgres=# SELECT * FROM x WHERE a IN (0, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (1, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (NULL);
 a | b 
---+---
(0 rows)
{noformat}


was (Author: aserbin):
Yes, psql supports NULL in IN-list predicates.  At least with PostgreSQL 9.3.  
Probably, that's done to support sub-selects like {{SELECT * FROM x WHERE 
field_x IN (SELECT field_y FROM y}}.  By itself, {{WHERE field_x IN (NULL)}} 
should result in empty resultset by definition, since it's the same as {{WHERE 
field_x = NULL}}.

I'm not sure whether supporting that brings any value to the Kudu project as is 
since sub-selects are not supported in Kudu now, AFAIK. It's more about syntax 
consistency.

{noformat}
postgres@ubuntu-14:~$ psql
psql (9.3.13)
Type "help" for help.

postgres=# INSERT INTO x VALUES (0, 1);
INSERT 0 1
postgres=# INSERT INTO x VALUES (1, NULL);
INSERT 0 1
postgres=# SELECT * FROM x;
 a | b 
---+---
 0 | 1
 1 |  
(2 rows)

postgres=# SELECT * FROM x WHERE a IN (0, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (1, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (NULL);
 a | b 
---+---
(0 rows)
{noformat}

> Add IS NULL predicate type
> --
>
> Key: KUDU-1642
> URL: https://issues.apache.org/jira/browse/KUDU-1642
> Project: Kudu
>  Issue Type: Sub-task
>  Components: client, tablet
>Reporter: Dan Burkert
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1652) Partition pruning / scan optimization fails with IS NOT NULL predicate on PK column

2016-09-27 Thread Dan Burkert (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert resolved KUDU-1652.
---
   Resolution: Fixed
Fix Version/s: 1.0.1
   1.0.0

> Partition pruning / scan optimization fails with IS NOT NULL predicate on PK 
> column
> ---
>
> Key: KUDU-1652
> URL: https://issues.apache.org/jira/browse/KUDU-1652
> Project: Kudu
>  Issue Type: Sub-task
>  Components: client, tablet
>Affects Versions: 1.0.0
>Reporter: Dan Burkert
>Assignee: Dan Burkert
>Priority: Blocker
> Fix For: 1.0.0, 1.0.1
>
>
> Both the Java client and C++ client/server currently have a bug where 
> attempting a scan with an {{IS NOT NULL}} predicate on a primary key column 
> can through an exception (Java), or crash the C++ client or server.  This is 
> a rare situation currently since {{IS NOT NULL}} is not publicly accessible, 
> so it has to come from a simplified predicate like {{my_int8_column <= 127}}. 
>  The fix is straightforward: stop encoding the lower/upper bound keys when an 
> {{IS NOT NULL}} predicate is encountered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KUDU-1642) Add IS NULL predicate type

2016-09-27 Thread Alexey Serbin (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526919#comment-15526919
 ] 

Alexey Serbin edited comment on KUDU-1642 at 9/27/16 5:55 PM:
--

Yes, psql supports NULL in IN-list predicates.  At least with PostgreSQL 9.3.  
Probably, that's done to support sub-selects like {{SELECT * FROM x WHERE 
field_x IN (SELECT field_y FROM y}}.  By itself, {{WHERE field_x IN (NULL)}} 
should result in empty resultset by definition, since it's the same as {{WHERE 
field_x = NULL}}.

I'm not sure whether supporting that brings any value to the Kudu project as is 
since sub-selects are not supported in Kudu now, AFAIK. It's more about syntax 
consistency.

{noformat}
postgres@ubuntu-14:~$ psql
psql (9.3.13)
Type "help" for help.

postgres=# INSERT INTO x VALUES (0, 1);
INSERT 0 1
postgres=# INSERT INTO x VALUES (1, NULL);
INSERT 0 1
postgres=# SELECT * FROM x;
 a | b 
---+---
 0 | 1
 1 |  
(2 rows)

postgres=# SELECT * FROM x WHERE a IN (0, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (1, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (NULL);
 a | b 
---+---
(0 rows)
{noformat}


was (Author: aserbin):
Yes, psql supports NULL in IN-list predicates.  At least with PostgreSQL 9.3.  
Probably, that's done to support sub-selects like 'SELECT * FROM x WHERE 
field_x IN (SELECT field_y FROM y), because {{WHERE field_x IN (NULL)}} should 
result in empty resultset by definition (it's the same as {{WHERE field_x = 
NULL}}).

{noformat}
postgres@ubuntu-14:~$ psql
psql (9.3.13)
Type "help" for help.

postgres=# INSERT INTO x VALUES (0, 1);
INSERT 0 1
postgres=# INSERT INTO x VALUES (1, NULL);
INSERT 0 1
postgres=# SELECT * FROM x;
 a | b 
---+---
 0 | 1
 1 |  
(2 rows)

postgres=# SELECT * FROM x WHERE a IN (0, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (1, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (NULL);
 a | b 
---+---
(0 rows)
{noformat}

> Add IS NULL predicate type
> --
>
> Key: KUDU-1642
> URL: https://issues.apache.org/jira/browse/KUDU-1642
> Project: Kudu
>  Issue Type: Sub-task
>  Components: client, tablet
>Reporter: Dan Burkert
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1642) Add IS NULL predicate type

2016-09-27 Thread Alexey Serbin (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526919#comment-15526919
 ] 

Alexey Serbin commented on KUDU-1642:
-

Yes, psql supports NULL in IN-list predicates.  At least with PostgreSQL 9.3.  
Probably, that's done to support sub-selects like 'SELECT * FROM x WHERE 
field_x IN (SELECT field_y FROM y), because {{WHERE field_x IN (NULL)}} should 
result in empty resultset by definition (it's the same as {{WHERE field_x = 
NULL}}).

{noformat}
postgres@ubuntu-14:~$ psql
psql (9.3.13)
Type "help" for help.

postgres=# INSERT INTO x VALUES (0, 1);
INSERT 0 1
postgres=# INSERT INTO x VALUES (1, NULL);
INSERT 0 1
postgres=# SELECT * FROM x;
 a | b 
---+---
 0 | 1
 1 |  
(2 rows)

postgres=# SELECT * FROM x WHERE a IN (0, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (1, NULL);
 a | b 
---+---
 0 | 1
(1 row)

postgres=# SELECT * FROM x WHERE b IN (NULL);
 a | b 
---+---
(0 rows)
{noformat}

> Add IS NULL predicate type
> --
>
> Key: KUDU-1642
> URL: https://issues.apache.org/jira/browse/KUDU-1642
> Project: Kudu
>  Issue Type: Sub-task
>  Components: client, tablet
>Reporter: Dan Burkert
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1619) Master Web UI "Tablet Servers" tab should separate live and suspected dead tablet servers

2016-09-27 Thread Will Berkeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-1619.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

Resolved by Ninad in 376f95b6dc19ceb13221f02851cf58366f71825d

> Master Web UI "Tablet Servers" tab should separate live and suspected dead 
> tablet servers
> -
>
> Key: KUDU-1619
> URL: https://issues.apache.org/jira/browse/KUDU-1619
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: Dan Burkert
>Assignee: Ninad Shringarpure
>  Labels: newbie
> Fix For: 1.1.0
>
>
> We already list the count of live and dead tablet servers, we should split 
> them into separate tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-352) Decide on and implement within-batch ordering for client API

2016-09-27 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526783#comment-15526783
 ] 

Jean-Daniel Cryans commented on KUDU-352:
-

The solution I had implemented in the Java client that you were referring to in 
your May 14th 2015 comment was completely refactored this summer and we still 
retain ordering. So at least on this side we're good.

> Decide on and implement within-batch ordering for client API
> 
>
> Key: KUDU-352
> URL: https://issues.apache.org/jira/browse/KUDU-352
> Project: Kudu
>  Issue Type: New Feature
>  Components: client
>Affects Versions: M5
>Reporter: Vladimir Feinberg
>
> Currently, the when the client applies a sequence of WriteOperations to a 
> session without flushing (within a single batch), the batcher runs tablet 
> location lookup asynchronously (see method Batcher::TabletLookupFinished). 
> Thus, it is possible that within the same batch, even with manual flushing, 
> the PerTSBuffer is flushed out of order (causing operations to arrive 
> out-of-order on the server side).
> A contract needs to be designed (and applied to both C++ and Java APIs) 
> regarding the strength of the ordering within the batches.
> Some options:
> 1. No order guaranteed (current). Client must manually flush between batches 
> to ensure order.
> 2. Per-row order guarantee - operations are sent to the server where for a 
> given key, the sequence of operations is preserved.
> 3. Strict ordering guarantee. Independent of keys, order of batch is matched.
> Things to consider:
> -> Is (2) different from (3)? With HybridTime, the client should only see 
> changes atomically on a per-batch level with concurrent reads. Then 
> between-row operations do not matter (until multi-row transactions are 
> introduced).
> -> A flexible version of the API that could include BarrierWriteOperations 
> which would allow the user to control order within batches themselves.
> -> Simplifying things entirely, removing all order (force the client to use a 
> transaction or flushes to ensure order).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1241) Add support for Kudu TIMESTAMPs to the Impala kudu scanners

2016-09-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-1241.
---
   Resolution: Incomplete
Fix Version/s: n/a

Resolving since this is tracked on the Impala side (link above)

> Add support for Kudu TIMESTAMPs to the Impala kudu scanners
> ---
>
> Key: KUDU-1241
> URL: https://issues.apache.org/jira/browse/KUDU-1241
> Project: Kudu
>  Issue Type: New Feature
>  Components: impala
>Affects Versions: Public beta
>Reporter: David Alves
> Fix For: n/a
>
>
> We currently have the TIMESTAMP type on the Kudu side but still haven't added 
> support for it on the impala side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1216) Add integration for Spark DStream map and foreach partition

2016-09-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-1216.
---
   Resolution: Duplicate
Fix Version/s: n/a

Closing as duplicate since I think this is implemented by the current Spark 
integration. Feel free to re-open if I misunderstood.

> Add integration for Spark DStream map and foreach partition
> ---
>
> Key: KUDU-1216
> URL: https://issues.apache.org/jira/browse/KUDU-1216
> Project: Kudu
>  Issue Type: New Feature
>  Components: integration
>Reporter: Ted Malaska
> Fix For: n/a
>
>
> This jira will add two implicit method to Spark DStream
> 1. kuduForeachPartition
> 2. kuduMapPartitions
> These method will act like the basic foreach/map partition but they will 
> provide the developer a live client to interact with Kudu
> These methods will be accessable from two different call points.
> 1. Scala DStream
> 2. KuduContext (which will work for Java)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1215) Add integration for Spark map and foreach partition

2016-09-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-1215.
---
   Resolution: Duplicate
Fix Version/s: n/a

Resolving this as duplicate since I think the API that is currently available 
solves these use cases. Feel free to reopen if I misunderstood the feature 
described here.

> Add integration for Spark map and foreach partition
> ---
>
> Key: KUDU-1215
> URL: https://issues.apache.org/jira/browse/KUDU-1215
> Project: Kudu
>  Issue Type: New Feature
>  Components: integration
>Reporter: Ted Malaska
> Fix For: n/a
>
>
> This jira will add two implicit method to Spark RDD
> 1. kuduForeachPartition
> 2. kuduMapPartitions
> These method will act like the basic foreach/map partition but they will 
> provide the developer a live client to interact with Kudu
> These methods will be accessable from two different call points.
> 1. Scala RDD
> 2. KuduContext (which will work for Java)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-352) Decide on and implement within-batch ordering for client API

2016-09-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526741#comment-15526741
 ] 

Todd Lipcon commented on KUDU-352:
--

[~aserbin] - do you think this issue is now fully implemented/resolved in both 
the Java and C++ clients? I think you're the latest to look at ordering 
semantics from the client API.

> Decide on and implement within-batch ordering for client API
> 
>
> Key: KUDU-352
> URL: https://issues.apache.org/jira/browse/KUDU-352
> Project: Kudu
>  Issue Type: New Feature
>  Components: client
>Affects Versions: M5
>Reporter: Vladimir Feinberg
>
> Currently, the when the client applies a sequence of WriteOperations to a 
> session without flushing (within a single batch), the batcher runs tablet 
> location lookup asynchronously (see method Batcher::TabletLookupFinished). 
> Thus, it is possible that within the same batch, even with manual flushing, 
> the PerTSBuffer is flushed out of order (causing operations to arrive 
> out-of-order on the server side).
> A contract needs to be designed (and applied to both C++ and Java APIs) 
> regarding the strength of the ordering within the batches.
> Some options:
> 1. No order guaranteed (current). Client must manually flush between batches 
> to ensure order.
> 2. Per-row order guarantee - operations are sent to the server where for a 
> given key, the sequence of operations is preserved.
> 3. Strict ordering guarantee. Independent of keys, order of batch is matched.
> Things to consider:
> -> Is (2) different from (3)? With HybridTime, the client should only see 
> changes atomically on a per-batch level with concurrent reads. Then 
> between-row operations do not matter (until multi-row transactions are 
> introduced).
> -> A flexible version of the API that could include BarrierWriteOperations 
> which would allow the user to control order within batches themselves.
> -> Simplifying things entirely, removing all order (force the client to use a 
> transaction or flushes to ensure order).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-99) Separate internal (e.g., storage related) column attributes from external ones and change APIs

2016-09-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-99.
-
   Resolution: Won't Fix
Fix Version/s: n/a

Resolving this as wont fix, since the surgery required at this point would be 
pretty heavy and probably break wire compat. We can re-open if it becomes more 
urgent such that it's worth the compatibility issues.

> Separate internal (e.g., storage related) column attributes from external 
> ones and change APIs
> --
>
> Key: KUDU-99
> URL: https://issues.apache.org/jira/browse/KUDU-99
> Project: Kudu
>  Issue Type: New Feature
>  Components: client, master, tablet
>Affects Versions: Backlog
>Reporter: Alex Feinberg
> Fix For: n/a
>
>
> We currently use ColumnSchema (and it's matching protobuf messages 
> ColumnSchemaPB) to specify both logical (e.g., which data type, what 
> constitutes a key, is it nullable) and physical (e.g., compression codec, 
> encoding algorithm) attributes.
> We need an approach that allows the user to specify and alter these externals 
> attributes via our "DDL" APIs, without having to include them in every single 
> message and data structure: e.g., when sending other data over the wire 
> (which may as well be encoded in a different format or sent plain text).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-81) rpc-test TestConnectionKeepalive failure

2016-09-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated KUDU-81:

Issue Type: Bug  (was: New Feature)

> rpc-test TestConnectionKeepalive failure
> 
>
> Key: KUDU-81
> URL: https://issues.apache.org/jira/browse/KUDU-81
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc, test
>Affects Versions: M4
>Reporter: Todd Lipcon
>Priority: Trivial
>  Labels: flaky
>
> Saw this fail once:
> {code}
> /var/lib/jenkins/workspace/kudu-test/BUILD_TYPE/LEAKCHECK/label/centos6-kudu/src/rpc/rpc-test.cc:155:
>  Failure
> Value of: metrics.num_server_connections_
>   Actual: 1
> Expected: 0
> Server should have 0 server connections
> {code}
> Probably just a timing issue in the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1214) Add Integration points for Spark, Spark Streaming, and Spark SQL

2016-09-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-1214.
---
   Resolution: Duplicate
Fix Version/s: n/a

Hearing nothing, I'm going to assume that what we've already built satisfies 
this original JIRA. Feel free to open a new JIRA if there are feature requests 
against the Spark integration.

> Add Integration points for Spark, Spark Streaming, and Spark SQL
> 
>
> Key: KUDU-1214
> URL: https://issues.apache.org/jira/browse/KUDU-1214
> Project: Kudu
>  Issue Type: New Feature
>  Components: integration
>Reporter: Ted Malaska
> Fix For: n/a
>
> Attachments: KUDU-1214.1.patch
>
>
> This Jira will be broken up into four main jira:
> 1. Add Support for Spark RDD map and foreach integration with Kudu
> 2. Add Support for Spark DStream map and foreach integration with Kudu
> 3. Add Support for Spark SQL defaultSource and push down predicates
> 4. Add documentation for all Spark Integrations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1654) Python 3 Client Test Failure: test_table_column

2016-09-27 Thread Jordan Birdsell (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan Birdsell updated KUDU-1654:
--
Status: In Review  (was: Open)

> Python 3 Client Test Failure: test_table_column
> ---
>
> Key: KUDU-1654
> URL: https://issues.apache.org/jira/browse/KUDU-1654
> Project: Kudu
>  Issue Type: Bug
>  Components: python
>Affects Versions: Public beta
>Reporter: Jordan Birdsell
>Assignee: Jordan Birdsell
>
> Python 3 requires an explicit encodinng to be specified when casting to 
> bytes, in python 2 bytes is synonymous with string so this is a non-issue. 
> This should be updated to use the compat module that has accounted for this 
> difference with the frombytes method.
> self = 
> def test_table_column(self):
> table = self.client.table(self.ex_table)
> cols = [(table['key'], 'key', 'int32'),
> (table[1], 'int_val', 'int32'),
> (table[-1], 'unixtime_micros_val', 'unixtime_micros')]
> 
> for col, name, type in cols:
> >   assert col.name == bytes(name)
> E   TypeError: string argument without an encoding



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1654) Python 3 Client Test Failure: test_table_column

2016-09-27 Thread Jordan Birdsell (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525987#comment-15525987
 ] 

Jordan Birdsell commented on KUDU-1654:
---

https://gerrit.cloudera.org/#/c/4543/

> Python 3 Client Test Failure: test_table_column
> ---
>
> Key: KUDU-1654
> URL: https://issues.apache.org/jira/browse/KUDU-1654
> Project: Kudu
>  Issue Type: Bug
>  Components: python
>Affects Versions: Public beta
>Reporter: Jordan Birdsell
>Assignee: Jordan Birdsell
>
> Python 3 requires an explicit encodinng to be specified when casting to 
> bytes, in python 2 bytes is synonymous with string so this is a non-issue. 
> This should be updated to use the compat module that has accounted for this 
> difference with the frombytes method.
> self = 
> def test_table_column(self):
> table = self.client.table(self.ex_table)
> cols = [(table['key'], 'key', 'int32'),
> (table[1], 'int_val', 'int32'),
> (table[-1], 'unixtime_micros_val', 'unixtime_micros')]
> 
> for col, name, type in cols:
> >   assert col.name == bytes(name)
> E   TypeError: string argument without an encoding



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1654) Python 3 Client Test Failure: test_table_column

2016-09-27 Thread Jordan Birdsell (JIRA)
Jordan Birdsell created KUDU-1654:
-

 Summary: Python 3 Client Test Failure: test_table_column
 Key: KUDU-1654
 URL: https://issues.apache.org/jira/browse/KUDU-1654
 Project: Kudu
  Issue Type: Bug
  Components: python
Affects Versions: Public beta
Reporter: Jordan Birdsell
Assignee: Jordan Birdsell


Python 3 requires an explicit encodinng to be specified when casting to bytes, 
in python 2 bytes is synonymous with string so this is a non-issue. This should 
be updated to use the compat module that has accounted for this difference with 
the frombytes method.

self = 

def test_table_column(self):
table = self.client.table(self.ex_table)
cols = [(table['key'], 'key', 'int32'),
(table[1], 'int_val', 'int32'),
(table[-1], 'unixtime_micros_val', 'unixtime_micros')]

for col, name, type in cols:
>   assert col.name == bytes(name)
E   TypeError: string argument without an encoding




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1653) Python 3 Failing to decode serialized token

2016-09-27 Thread Jordan Birdsell (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525935#comment-15525935
 ] 

Jordan Birdsell commented on KUDU-1653:
---

https://gerrit.cloudera.org/#/c/4542/

> Python 3 Failing to decode serialized token
> ---
>
> Key: KUDU-1653
> URL: https://issues.apache.org/jira/browse/KUDU-1653
> Project: Kudu
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.1.0
>Reporter: Jordan Birdsell
>Assignee: Jordan Birdsell
>
> Python 3 attempts to deserialize token into utf-8 string and causes failure:
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 141: 
> invalid start byte



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1653) Python 3 Failing to decode serialized token

2016-09-27 Thread Jordan Birdsell (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan Birdsell updated KUDU-1653:
--
Status: In Review  (was: Open)

> Python 3 Failing to decode serialized token
> ---
>
> Key: KUDU-1653
> URL: https://issues.apache.org/jira/browse/KUDU-1653
> Project: Kudu
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.1.0
>Reporter: Jordan Birdsell
>Assignee: Jordan Birdsell
>
> Python 3 attempts to deserialize token into utf-8 string and causes failure:
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 141: 
> invalid start byte



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1653) Python 3 Failing to decode serialized token

2016-09-27 Thread Jordan Birdsell (JIRA)
Jordan Birdsell created KUDU-1653:
-

 Summary: Python 3 Failing to decode serialized token
 Key: KUDU-1653
 URL: https://issues.apache.org/jira/browse/KUDU-1653
 Project: Kudu
  Issue Type: Bug
  Components: python
Affects Versions: 1.1.0
Reporter: Jordan Birdsell
Assignee: Jordan Birdsell


Python 3 attempts to deserialize token into utf-8 string and causes failure:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 141: 
invalid start byte



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1650) Python - Python 3 GetUnixTimeMicros Symbol Not Recognized

2016-09-27 Thread Jordan Birdsell (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan Birdsell resolved KUDU-1650.
---
   Resolution: Not A Problem
Fix Version/s: n/a

Issue was related to not having the LD_LIBRARY_PATH set

> Python - Python 3 GetUnixTimeMicros Symbol Not Recognized
> -
>
> Key: KUDU-1650
> URL: https://issues.apache.org/jira/browse/KUDU-1650
> Project: Kudu
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.1.0
>Reporter: Jordan Birdsell
>Assignee: Jordan Birdsell
>Priority: Blocker
> Fix For: n/a
>
>
> Python 3 is raising symbol not recognized errors on the method 
> GetUnixTimeMicros.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)