from:"Artem Aliev \(JIRA\)"

[jira] [Commented] (CASSANDRA-11542) Create a benchmark to compare HDFS and Cassandra bulk read times

2016-05-19 Thread Artem Aliev (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290743#comment-15290743
 ] 

Artem Aliev commented on CASSANDRA-11542:
-

[~Stefania], I have created 
https://datastax-oss.atlassian.net/browse/SPARKC-383 for your connector finding.
If you have new code on finding, please comments there. I will discuss the 
design go the change with the team and improve you proposal.

> Create a benchmark to compare HDFS and Cassandra bulk read times
> 
>
> Key: CASSANDRA-11542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11542
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
> Attachments: jfr_recordings.zip, spark-load-perf-results-001.zip, 
> spark-load-perf-results-002.zip, spark-load-perf-results-003.zip
>
>
> I propose creating a benchmark for comparing Cassandra and HDFS bulk reading 
> performance. Simple Spark queries will be performed on data stored in HDFS or 
> Cassandra, and the entire duration will be measured. An example query would 
> be the max or min of a column or a count\(*\).
> This benchmark should allow determining the impact of:
> * partition size
> * number of clustering columns
> * number of value columns (cells)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

2016-04-12 Thread Artem Aliev (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237355#comment-15237355
 ] 

Artem Aliev commented on CASSANDRA-11553:
-

CHANGES.txt: Always close cluster with connection in CqlRecordWriter 
(CASSANDRA-11553)

> hadoop.cql3.CqlRecordWriter does not close cluster on reconnect
> ---
>
> Key: CASSANDRA-11553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11553
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
>Assignee: Artem Aliev
> Fix For: 2.2.6, 3.5, 3.6, 3.0.6
>
> Attachments: CASSANDRA-11553-2.2.txt
>
>
> CASSANDRA-10058 add session and cluster close to all places in hadoop except 
> one place on reconnection.
> The writer uses one connection per new cluster, so I added cluster.close() 
> call to sesseionClose() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

2016-04-12 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated CASSANDRA-11553:

Fix Version/s: 3.6
   3.5

> hadoop.cql3.CqlRecordWriter does not close cluster on reconnect
> ---
>
> Key: CASSANDRA-11553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11553
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
>Assignee: Artem Aliev
> Fix For: 2.2.6, 3.5, 3.6, 3.0.6
>
> Attachments: CASSANDRA-11553-2.2.txt
>
>
> CASSANDRA-10058 add session and cluster close to all places in hadoop except 
> one place on reconnection.
> The writer uses one connection per new cluster, so I added cluster.close() 
> call to sesseionClose() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

2016-04-12 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated CASSANDRA-11553:

Fix Version/s: 3.0.6
   2.2.6

> hadoop.cql3.CqlRecordWriter does not close cluster on reconnect
> ---
>
> Key: CASSANDRA-11553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11553
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
>Assignee: Artem Aliev
> Fix For: 2.2.6, 3.0.6
>
> Attachments: CASSANDRA-11553-2.2.txt
>
>
> CASSANDRA-10058 add session and cluster close to all places in hadoop except 
> one place on reconnection.
> The writer uses one connection per new cluster, so I added cluster.close() 
> call to sesseionClose() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

2016-04-12 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated CASSANDRA-11553:

Status: Patch Available  (was: Open)

> hadoop.cql3.CqlRecordWriter does not close cluster on reconnect
> ---
>
> Key: CASSANDRA-11553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11553
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
>Assignee: Artem Aliev
> Attachments: CASSANDRA-11553-2.2.txt
>
>
> CASSANDRA-10058 add session and cluster close to all places in hadoop except 
> one place on reconnection.
> The writer uses one connection per new cluster, so I added cluster.close() 
> call to sesseionClose() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

2016-04-12 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated CASSANDRA-11553:

Reviewer: Aleksey Yeschenko

> hadoop.cql3.CqlRecordWriter does not close cluster on reconnect
> ---
>
> Key: CASSANDRA-11553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11553
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
>Assignee: Artem Aliev
> Attachments: CASSANDRA-11553-2.2.txt
>
>
> CASSANDRA-10058 add session and cluster close to all places in hadoop except 
> one place on reconnection.
> The writer uses one connection per new cluster, so I added cluster.close() 
> call to sesseionClose() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

2016-04-12 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated CASSANDRA-11553:

Attachment: CASSANDRA-11553-2.2.txt

> hadoop.cql3.CqlRecordWriter does not close cluster on reconnect
> ---
>
> Key: CASSANDRA-11553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11553
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
>Assignee: Artem Aliev
> Attachments: CASSANDRA-11553-2.2.txt
>
>
> CASSANDRA-10058 add session and cluster close to all places in hadoop except 
> one place on reconnection.
> The writer uses one connection per new cluster, so I added cluster.close() 
> call to sesseionClose() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

2016-04-12 Thread Artem Aliev (JIRA)

Artem Aliev created CASSANDRA-11553:
---

 Summary: hadoop.cql3.CqlRecordWriter does not close cluster on 
reconnect
 Key: CASSANDRA-11553
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11553
 Project: Cassandra
  Issue Type: Bug
Reporter: Artem Aliev
Assignee: Artem Aliev


CASSANDRA-10058 add session and cluster close to all places in hadoop except 
one place on reconnection.
The writer uses one connection per new cluster, so I added cluster.close() call 
to sesseionClose() method.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10835) CqlInputFormat creates too small splits for map Hadoop tasks

2015-12-10 Thread Artem Aliev (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050388#comment-15050388
 ] 

Artem Aliev edited comment on CASSANDRA-10835 at 12/10/15 9:07 AM:
---

I like [~JoshuaMcKenzie] idea. See new patches (v2) for 3.0 and 2.2 branches
MB-based param was added, if it is not set the old algorithm is used.


was (Author: artem.aliev):
Joshua idea implementation, MB-based param was added

> CqlInputFormat  creates too small splits for map Hadoop tasks
> -
>
> Key: CASSANDRA-10835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
> Attachments: cassandra-2.2-10835-2.txt, cassandra-3.0.1-10835-2.txt, 
> cassandra-3.0.1-10835.txt
>
>
> CqlInputFormat use number of rows in C* version < 2.2 to define split size
> The default split size was 64K rows.
> {code}
> private static final int DEFAULT_SPLIT_SIZE = 64 * 1024;
> {code}
> The doc:
> {code}
> * You can also configure the number of rows per InputSplit with
>  *   ConfigHelper.setInputSplitSize. The default split size is 64k rows.
>  {code}
> New split algorithm assumes that SPLIT size is in bytes, so it creates really 
> small map hadoop tasks by default (or with old configs).
> There two way to fix it:
> 1. Update the doc and increase default value to something like 16MB
> 2. Make the C* to be compatible with older version.
> I like the second options, as it will not surprise people who upgrade from 
> old versions. I do not expect a lot of new user that will use Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10835) CqlInputFormat creates too small splits for map Hadoop tasks

2015-12-10 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated CASSANDRA-10835:

Attachment: cassandra-3.0.1-10835-2.txt
cassandra-2.2-10835-2.txt

Joshua idea implementation, MB-based param was added

> CqlInputFormat  creates too small splits for map Hadoop tasks
> -
>
> Key: CASSANDRA-10835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
> Attachments: cassandra-2.2-10835-2.txt, cassandra-3.0.1-10835-2.txt, 
> cassandra-3.0.1-10835.txt
>
>
> CqlInputFormat use number of rows in C* version < 2.2 to define split size
> The default split size was 64K rows.
> {code}
> private static final int DEFAULT_SPLIT_SIZE = 64 * 1024;
> {code}
> The doc:
> {code}
> * You can also configure the number of rows per InputSplit with
>  *   ConfigHelper.setInputSplitSize. The default split size is 64k rows.
>  {code}
> New split algorithm assumes that SPLIT size is in bytes, so it creates really 
> small map hadoop tasks by default (or with old configs).
> There two way to fix it:
> 1. Update the doc and increase default value to something like 16MB
> 2. Make the C* to be compatible with older version.
> I like the second options, as it will not surprise people who upgrade from 
> old versions. I do not expect a lot of new user that will use Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-10835) CqlInputFormat creates too small splits for map Hadoop tasks

2015-12-09 Thread Artem Aliev (JIRA)

Artem Aliev created CASSANDRA-10835:
---

 Summary: CqlInputFormat  creates too small splits for map Hadoop 
tasks
 Key: CASSANDRA-10835
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10835
 Project: Cassandra
  Issue Type: Bug
Reporter: Artem Aliev


CqlInputFormat use number of rows in C* version < 2.2 to define split size
The default split size was 64K rows.
{code}
private static final int DEFAULT_SPLIT_SIZE = 64 * 1024;
{code}

The doc:
{code}
* You can also configure the number of rows per InputSplit with
 *   ConfigHelper.setInputSplitSize. The default split size is 64k rows.
 {code}

New split algorithm assumes that SPLIT size is in bytes, so it creates really 
small map hadoop tasks by default (or with old configs).

There two way to fix it:
1. Update the doc and increase default value to something like 16MB
2. Make the C* to be compatible with older version.

I like the second options, as it will not surprise people who upgrade from old 
versions. I do not expect a lot of new user that will use Hadoop.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10835) CqlInputFormat creates too small splits for map Hadoop tasks

2015-12-09 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated CASSANDRA-10835:

Attachment: cassandra-3.0.1-10835.txt

Make C* to be compatible with older versions

> CqlInputFormat  creates too small splits for map Hadoop tasks
> -
>
> Key: CASSANDRA-10835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
> Attachments: cassandra-3.0.1-10835.txt
>
>
> CqlInputFormat use number of rows in C* version < 2.2 to define split size
> The default split size was 64K rows.
> {code}
> private static final int DEFAULT_SPLIT_SIZE = 64 * 1024;
> {code}
> The doc:
> {code}
> * You can also configure the number of rows per InputSplit with
>  *   ConfigHelper.setInputSplitSize. The default split size is 64k rows.
>  {code}
> New split algorithm assumes that SPLIT size is in bytes, so it creates really 
> small map hadoop tasks by default (or with old configs).
> There two way to fix it:
> 1. Update the doc and increase default value to something like 16MB
> 2. Make the C* to be compatible with older version.
> I like the second options, as it will not surprise people who upgrade from 
> old versions. I do not expect a lot of new user that will use Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8577) Values of set types not loading correctly into Pig

2015-01-09 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated CASSANDRA-8577:
---
Attachment: cassandra-2.1-8577.txt

 Values of set types not loading correctly into Pig
 --

 Key: CASSANDRA-8577
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577
 Project: Cassandra
  Issue Type: Bug
Reporter: Oksana Danylyshyn
Assignee: Brandon Williams
 Fix For: 2.1.3

 Attachments: cassandra-2.1-8577.txt


 Values of set types are not loading correctly from Cassandra (cql3 table, 
 Native protocol v3) into Pig using CqlNativeStorage. 
 When using Cassandra version 2.1.0 only empty values are loaded, and for 
 newer versions (2.1.1 and 2.1.2) the following error is received: 
 org.apache.cassandra.serializers.MarshalException: Unexpected extraneous 
 bytes after set value
 at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
 Steps to reproduce:
 {code}cqlsh:socialdata CREATE TABLE test (
  key varchar PRIMARY KEY,
  tags setvarchar
);
 cqlsh:socialdata insert into test (key, tags) values ('key', {'Running', 
 'onestep4red', 'running'});
 cqlsh:socialdata select * from test;
  key | tags
 -+---
  key | {'Running', 'onestep4red', 'running'}
 (1 rows){code}
 With version 2.1.0:
 {code}grunt data = load 'cql://socialdata/test' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 grunt dump data;
 (key,()){code}
 With version 2.1.2:
 {code}grunt data = load 'cql://socialdata/test' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 grunt dump data;
 org.apache.cassandra.serializers.MarshalException: Unexpected extraneous 
 bytes after set value
   at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
   at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27)
   at 
 org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
   at 
 org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
   at 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code}
 Expected result:
 {code}(key,(Running,onestep4red,running)){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8577) Values of set types not loading correctly into Pig

2015-01-09 Thread Artem Aliev (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270942#comment-14270942
 ] 

Artem Aliev edited comment on CASSANDRA-8577 at 1/9/15 12:08 PM:
-

to reproduce the bug with unit tests:
1 replace ./build/lib/jars/cassandra-driver-core-2.0.5.jar with 
cassandra-driver-core-2.1.3.jar
2 run pig unit tests 
 ant pig-test -Dtest.name=CqlTableDataTypeTest
{code}
….
   [junit] org.apache.cassandra.serializers.MarshalException: Unexpected 
extraneous bytes after list value
[junit] at 
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:104)
[junit] at 
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
[junit] at 
org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
[junit] at 
org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
[junit] at 
org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
[junit] at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
[junit] at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
[junit] at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
[junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
[junit] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
[junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
[junit] at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
….
{code}

Cassandra 2.1 is shipped with java driver 2.0, that used V2 native protocol. 
The java driver 2.1 is available and it use V3 native protocol.
The collection serialisation is changed in V3. Current implementation of pig 
reader has harcoded version 1 for deserialisation, as result of incomplete fix 
of CASSANDRA-7287.
The version 1 should be used in cql-over-thrift deprecated API only. 
CqlNativeStorage use java driver protocol. So the patch passes the negotiated 
by java driver serialisation protocol to deserialiser in case CqlNativeStorage 
is used. I also add optional ‘cassandra.input.native.protocol.version’ 
parameter to force the protocol version, just in case.



was (Author: artem.aliev):
to reproduce the bug with unit tests:
1 replace ./build/lib/jars/cassandra-driver-core-2.0.5.jar with 
cassandra-driver-core-2.0.5.jar
2 run pig unit tests 
 ant pig-test -Dtest.name=CqlTableDataTypeTest
{code}
….
   [junit] org.apache.cassandra.serializers.MarshalException: Unexpected 
extraneous bytes after list value
[junit] at 
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:104)
[junit] at 
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
[junit] at 
org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
[junit] at 
org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
[junit] at 
org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
[junit] at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
[junit] at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
[junit] at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
[junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
[junit] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
[junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
[junit] at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
….
{code}

Cassandra 2.1 is shipped with java driver 2.0, that used V2 native protocol. 
The java driver 2.1 is available and it use V3 native protocol.
The collection serialisation is changed in V3. Current implementation of pig 
reader has harcoded version 1 for deserialisation, as result of incomplete fix 
of CASSANDRA-7287.
The version 1 should be used in cql-over-thrift deprecated API only. 
CqlNativeStorage use java driver protocol. So the patch passes the negotiated 
by java driver serialisation protocol to deserialiser in case CqlNativeStorage 
is used. I also add optional ‘cassandra.input.native.protocol.version’ 
parameter to force the protocol version, just in case.


 Values of set types not loading correctly into Pig
 --

 Key: CASSANDRA-8577
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577
 Project: Cassandra
  Issue

[jira] [Resolved] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

2014-12-18 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev resolved CASSANDRA-8471.

   Resolution: Duplicate
Fix Version/s: (was: 2.1.3)
   (was: 2.0.12)

 mapred/hive queries fail when there is just 1 node down RF is  1
 -

 Key: CASSANDRA-8471
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8471
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Artem Aliev
  Labels: easyfix, hadoop, patch
 Attachments: cassandra-2.0-8471.txt


 The hive and map reduce queries fail when just 1 node is down, even with RF=3 
 (in a 6 node cluster) and default consistency levels for Read and Write.
 The simpliest way to reproduce it is to use DataStax integrated hadoop 
 environment with hive.
 {quote}
 alter keyspace HiveMetaStore WITH replication = 
 {'class':'NetworkTopologyStrategy', 'DC1':3} ;
 alter keyspace cfs WITH replication = {'class':'NetworkTopologyStrategy', 
 'DC1':3} ;
 alter keyspace cfs_archive WITH replication = 
 {'class':'NetworkTopologyStrategy', 'DC1':3} ;
 CREATE KEYSPACE datamart WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'DC1': '3'
 };
 CREATE TABLE users1 (
   id int,
   name text,
   PRIMARY KEY ((id))
 )
 {quote}
 Insert data.
 Shutdown one cassandra node.
 Run map reduce task. Hive in this case
 {quote}
 $ dse hive
 hive use datamart;
 hive select count(*) from users1;
 {quote}
 {quote}
 ...
 ...
 2014-12-10 18:33:53,090 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:54,093 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:55,096 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:56,099 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:57,102 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 6.39 sec
 MapReduce Total cumulative CPU time: 6 seconds 390 msec
 Ended Job = job_201412100017_0006 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://i-9d0306706.c.eng-gce-support.internal:50030/jobdetails.jsp?jobid=job_201412100017_0006
 Examining task ID: task_201412100017_0006_m_05 (and more) from job 
 job_201412100017_0006
 Task with the most failures(4):
 -
 Task ID:
   task_201412100017_0006_m_01
 URL:
   
 http://i-9d0306706.c.eng-gce-support.internal:50030/taskdetails.jsp?jobid=job_201412100017_0006tipid=task_201412100017_0006_m_01
 -
 Diagnostic Messages for this Task:
 java.io.IOException: java.io.IOException: 
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
 tried for query failed (tried: 
 i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
 (com.datastax.driver.core.TransportException: 
 [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:244)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:538)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)
 Caused by: java.io.IOException: 
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
 tried for query failed (tried: 
 i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
 (com.datastax.driver.core.TransportException: 
 [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
   at 
 org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:206)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:241)
   ... 9 more
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s) tried for query failed (tried: 
 i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
 (com.datastax.driver.core.TransportException:

[jira] [Created] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

2014-12-12 Thread Artem Aliev (JIRA)

Artem Aliev created CASSANDRA-8471:
--

 Summary: mapred/hive queries fail when there is just 1 node down 
RF is  1
 Key: CASSANDRA-8471
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8471
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Artem Aliev


The hive and map reduce queries fail when just 1 node is down, even with RF=3 
(in a 6 node cluster) and default consistency levels for Read and Write.

The simpliest way to reproduce it is to use DataStax integrated hadoop 
environment with hive.

{quote}
alter keyspace HiveMetaStore WITH replication = 
{'class':'NetworkTopologyStrategy', 'DC1':3} ;
alter keyspace cfs WITH replication = {'class':'NetworkTopologyStrategy', 
'DC1':3} ;
alter keyspace cfs_archive WITH replication = 
{'class':'NetworkTopologyStrategy', 'DC1':3} ;
CREATE KEYSPACE datamart WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1': '3'
};

CREATE TABLE users1 (
  id int,
  name text,
  PRIMARY KEY ((id))
)
{quote}

Insert data.
Shutdown one cassandra node.
Run map reduce task. Hive in this case

{quote}
$ dse hive
hive use datamart;
hive select count(*) from users1;
{quote}

{quote}
...
...
2014-12-10 18:33:53,090 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
sec
2014-12-10 18:33:54,093 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
sec
2014-12-10 18:33:55,096 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
sec
2014-12-10 18:33:56,099 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
sec
2014-12-10 18:33:57,102 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.39 
sec
MapReduce Total cumulative CPU time: 6 seconds 390 msec
Ended Job = job_201412100017_0006 with errors
Error during job, obtaining debugging information...
Job Tracking URL: 
http://i-9d0306706.c.eng-gce-support.internal:50030/jobdetails.jsp?jobid=job_201412100017_0006
Examining task ID: task_201412100017_0006_m_05 (and more) from job 
job_201412100017_0006

Task with the most failures(4):
-
Task ID:
  task_201412100017_0006_m_01

URL:
  
http://i-9d0306706.c.eng-gce-support.internal:50030/taskdetails.jsp?jobid=job_201412100017_0006tipid=task_201412100017_0006_m_01
-
Diagnostic Messages for this Task:
java.io.IOException: java.io.IOException: 
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (tried: 
i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
(com.datastax.driver.core.TransportException: 
[i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:244)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:538)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: java.io.IOException: 
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (tried: 
i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
(com.datastax.driver.core.TransportException: 
[i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
at 
org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:206)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:241)
... 9 more
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
host(s) tried for query failed (tried: 
i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
(com.datastax.driver.core.TransportException: 
[i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
at 
com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:196)
at 
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1104)
at com.datastax.driver.core.Cluster.init(Cluster.java:121)
at

[jira] [Updated] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

2014-12-12 Thread Artem Aliev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated CASSANDRA-8471:
---
Attachment: cassandra-2.0-8471.txt

 mapred/hive queries fail when there is just 1 node down RF is  1
 -

 Key: CASSANDRA-8471
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8471
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Artem Aliev
 Attachments: cassandra-2.0-8471.txt


 The hive and map reduce queries fail when just 1 node is down, even with RF=3 
 (in a 6 node cluster) and default consistency levels for Read and Write.
 The simpliest way to reproduce it is to use DataStax integrated hadoop 
 environment with hive.
 {quote}
 alter keyspace HiveMetaStore WITH replication = 
 {'class':'NetworkTopologyStrategy', 'DC1':3} ;
 alter keyspace cfs WITH replication = {'class':'NetworkTopologyStrategy', 
 'DC1':3} ;
 alter keyspace cfs_archive WITH replication = 
 {'class':'NetworkTopologyStrategy', 'DC1':3} ;
 CREATE KEYSPACE datamart WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'DC1': '3'
 };
 CREATE TABLE users1 (
   id int,
   name text,
   PRIMARY KEY ((id))
 )
 {quote}
 Insert data.
 Shutdown one cassandra node.
 Run map reduce task. Hive in this case
 {quote}
 $ dse hive
 hive use datamart;
 hive select count(*) from users1;
 {quote}
 {quote}
 ...
 ...
 2014-12-10 18:33:53,090 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:54,093 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:55,096 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:56,099 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:57,102 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 6.39 sec
 MapReduce Total cumulative CPU time: 6 seconds 390 msec
 Ended Job = job_201412100017_0006 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://i-9d0306706.c.eng-gce-support.internal:50030/jobdetails.jsp?jobid=job_201412100017_0006
 Examining task ID: task_201412100017_0006_m_05 (and more) from job 
 job_201412100017_0006
 Task with the most failures(4):
 -
 Task ID:
   task_201412100017_0006_m_01
 URL:
   
 http://i-9d0306706.c.eng-gce-support.internal:50030/taskdetails.jsp?jobid=job_201412100017_0006tipid=task_201412100017_0006_m_01
 -
 Diagnostic Messages for this Task:
 java.io.IOException: java.io.IOException: 
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
 tried for query failed (tried: 
 i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
 (com.datastax.driver.core.TransportException: 
 [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:244)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:538)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)
 Caused by: java.io.IOException: 
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
 tried for query failed (tried: 
 i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
 (com.datastax.driver.core.TransportException: 
 [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
   at 
 org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:206)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:241)
   ... 9 more
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s) tried for query failed (tried: 
 i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
 (com.datastax.driver.core.TransportException: 
 [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
   at

[jira] [Commented] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

2014-12-12 Thread Artem Aliev (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244458#comment-14244458
 ] 

Artem Aliev commented on CASSANDRA-8471:


The CqlRecordReader is used to read data from C* to map tasks. To connect to C* 
it receive a list of C* node locations. It suppose to check all that 
connections to find available nodes for control connect. But because the 
connect methods was out of the check loop, the fist node in the list  is always 
selected. If it is unavailable the map task failed with above Exception.
I just moved cluster.connect() call into the check loop. 




 mapred/hive queries fail when there is just 1 node down RF is  1
 -

 Key: CASSANDRA-8471
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8471
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Artem Aliev
  Labels: easyfix, hadoop, patch
 Fix For: 2.0.12, 2.1.3

 Attachments: cassandra-2.0-8471.txt


 The hive and map reduce queries fail when just 1 node is down, even with RF=3 
 (in a 6 node cluster) and default consistency levels for Read and Write.
 The simpliest way to reproduce it is to use DataStax integrated hadoop 
 environment with hive.
 {quote}
 alter keyspace HiveMetaStore WITH replication = 
 {'class':'NetworkTopologyStrategy', 'DC1':3} ;
 alter keyspace cfs WITH replication = {'class':'NetworkTopologyStrategy', 
 'DC1':3} ;
 alter keyspace cfs_archive WITH replication = 
 {'class':'NetworkTopologyStrategy', 'DC1':3} ;
 CREATE KEYSPACE datamart WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'DC1': '3'
 };
 CREATE TABLE users1 (
   id int,
   name text,
   PRIMARY KEY ((id))
 )
 {quote}
 Insert data.
 Shutdown one cassandra node.
 Run map reduce task. Hive in this case
 {quote}
 $ dse hive
 hive use datamart;
 hive select count(*) from users1;
 {quote}
 {quote}
 ...
 ...
 2014-12-10 18:33:53,090 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:54,093 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:55,096 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:56,099 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:57,102 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 6.39 sec
 MapReduce Total cumulative CPU time: 6 seconds 390 msec
 Ended Job = job_201412100017_0006 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://i-9d0306706.c.eng-gce-support.internal:50030/jobdetails.jsp?jobid=job_201412100017_0006
 Examining task ID: task_201412100017_0006_m_05 (and more) from job 
 job_201412100017_0006
 Task with the most failures(4):
 -
 Task ID:
   task_201412100017_0006_m_01
 URL:
   
 http://i-9d0306706.c.eng-gce-support.internal:50030/taskdetails.jsp?jobid=job_201412100017_0006tipid=task_201412100017_0006_m_01
 -
 Diagnostic Messages for this Task:
 java.io.IOException: java.io.IOException: 
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
 tried for query failed (tried: 
 i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
 (com.datastax.driver.core.TransportException: 
 [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:244)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:538)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)
 Caused by: java.io.IOException: 
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
 tried for query failed (tried: 
 i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
 (com.datastax.driver.core.TransportException: 
 [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
   at

[jira] [Comment Edited] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

2014-12-12 Thread Artem Aliev (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244458#comment-14244458
 ] 

Artem Aliev edited comment on CASSANDRA-8471 at 12/12/14 5:11 PM:
--

The CqlRecordReader is used to read data from C* to map tasks. To connect to C* 
it receive a list of C* node locations where given split(row) can be found. It 
suppose to check all that connections to find available nodes for control 
connect. But because the connect methods was out of the check loop, the fist 
node in the list  is always selected. If it is unavailable the map task failed 
with above Exception.
I just moved cluster.connect() call into the check loop. 





was (Author: artem.aliev):
The CqlRecordReader is used to read data from C* to map tasks. To connect to C* 
it receive a list of C* node locations. It suppose to check all that 
connections to find available nodes for control connect. But because the 
connect methods was out of the check loop, the fist node in the list  is always 
selected. If it is unavailable the map task failed with above Exception.
I just moved cluster.connect() call into the check loop. 




 mapred/hive queries fail when there is just 1 node down RF is  1
 -

 Key: CASSANDRA-8471
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8471
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Artem Aliev
  Labels: easyfix, hadoop, patch
 Fix For: 2.0.12, 2.1.3

 Attachments: cassandra-2.0-8471.txt


 The hive and map reduce queries fail when just 1 node is down, even with RF=3 
 (in a 6 node cluster) and default consistency levels for Read and Write.
 The simpliest way to reproduce it is to use DataStax integrated hadoop 
 environment with hive.
 {quote}
 alter keyspace HiveMetaStore WITH replication = 
 {'class':'NetworkTopologyStrategy', 'DC1':3} ;
 alter keyspace cfs WITH replication = {'class':'NetworkTopologyStrategy', 
 'DC1':3} ;
 alter keyspace cfs_archive WITH replication = 
 {'class':'NetworkTopologyStrategy', 'DC1':3} ;
 CREATE KEYSPACE datamart WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'DC1': '3'
 };
 CREATE TABLE users1 (
   id int,
   name text,
   PRIMARY KEY ((id))
 )
 {quote}
 Insert data.
 Shutdown one cassandra node.
 Run map reduce task. Hive in this case
 {quote}
 $ dse hive
 hive use datamart;
 hive select count(*) from users1;
 {quote}
 {quote}
 ...
 ...
 2014-12-10 18:33:53,090 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:54,093 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:55,096 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:56,099 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 
 sec
 2014-12-10 18:33:57,102 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 6.39 sec
 MapReduce Total cumulative CPU time: 6 seconds 390 msec
 Ended Job = job_201412100017_0006 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://i-9d0306706.c.eng-gce-support.internal:50030/jobdetails.jsp?jobid=job_201412100017_0006
 Examining task ID: task_201412100017_0006_m_05 (and more) from job 
 job_201412100017_0006
 Task with the most failures(4):
 -
 Task ID:
   task_201412100017_0006_m_01
 URL:
   
 http://i-9d0306706.c.eng-gce-support.internal:50030/taskdetails.jsp?jobid=job_201412100017_0006tipid=task_201412100017_0006_m_01
 -
 Diagnostic Messages for this Task:
 java.io.IOException: java.io.IOException: 
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
 tried for query failed (tried: 
 i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 
 (com.datastax.driver.core.TransportException: 
 [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:244)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:538)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at

[jira] [Commented] (CASSANDRA-11542) Create a benchmark to compare HDFS and Cassandra bulk read times

[jira] [Commented] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

[jira] [Created] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

[jira] [Comment Edited] (CASSANDRA-10835) CqlInputFormat creates too small splits for map Hadoop tasks

[jira] [Updated] (CASSANDRA-10835) CqlInputFormat creates too small splits for map Hadoop tasks

[jira] [Created] (CASSANDRA-10835) CqlInputFormat creates too small splits for map Hadoop tasks

[jira] [Updated] (CASSANDRA-10835) CqlInputFormat creates too small splits for map Hadoop tasks

[jira] [Updated] (CASSANDRA-8577) Values of set types not loading correctly into Pig

[jira] [Comment Edited] (CASSANDRA-8577) Values of set types not loading correctly into Pig

[jira] [Resolved] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

[jira] [Created] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

[jira] [Updated] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

[jira] [Commented] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

[jira] [Comment Edited] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is 1

19 matches

Site Navigation

Mail list logo

Footer information