from:"Pavel Yaskevich \(Commented\) \(JIRA\)"

[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-04-18 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256813#comment-13256813
 ] 

Pavel Yaskevich commented on CASSANDRA-3762:


It seems like saving cache's data positions (in combination with SSTable index 
summaries) to the disk to make it independent from the sstable loading is only 
viable solution we have.

 AutoSaving KeyCache and System load time improvements.
 --

 Key: CASSANDRA-3762
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3762
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-SavedKeyCache-load-time-improvements.patch


 CASSANDRA-2392 saves the index summary to the disk... but when we have saved 
 cache we will still scan through the index to get the data out.
 We might be able to separate this from SSTR.load and let it load the index 
 summary, once all the SST's are loaded we might be able to check the 
 bloomfilter and do a random IO on fewer Index's to populate the KeyCache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4138) Add varint encoding to Serializing Cache

2012-04-18 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257089#comment-13257089
 ] 

Pavel Yaskevich commented on CASSANDRA-4138:


Ok, let's give a Jonathan chance to make a final look.

 Add varint encoding to Serializing Cache
 

 Key: CASSANDRA-4138
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4138
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4138-Take1.patch, 
 0001-CASSANDRA-4138-V2.patch, 0001-CASSANDRA-4138-v4.patch, 
 0002-sizeof-changes-on-rest-of-the-code.patch, CASSANDRA-4138-v3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3909) Pig should handle wide rows

2012-04-16 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255134#comment-13255134
 ] 

Pavel Yaskevich commented on CASSANDRA-3909:


+1

 Pig should handle wide rows
 ---

 Key: CASSANDRA-3909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3909
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1

 Attachments: 3909.txt


 Pig should be able to use the wide row support in CFIF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4138) Add varint encoding to Serializing Cache

2012-04-16 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255143#comment-13255143
 ] 

Pavel Yaskevich commented on CASSANDRA-4138:


To avoid confusion related to naming of {write, read}VLong methods (as it gives 
a feeling that writeInt doesn't really write an int anymore) in the EDIS and 
EDOS I propose to rename them to {encode, decode}VInt. Furthermore, we could 
give a better feel of the encoding used by adding {VInt} as a prefix to both 
classes (as an alternative they could be moved to o.a.c.u.vint package), also I 
think the DBContants class is now should be changed to only share sizeof(type) 
methods and become something like DBContants.{native, encoded}.sizeof(type)...

 Add varint encoding to Serializing Cache
 

 Key: CASSANDRA-4138
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4138
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4138-Take1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4142) OOM Exception during repair session with LeveledCompactionStrategy

2012-04-12 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252523#comment-13252523
]

Pavel Yaskevich commented on CASSANDRA-4142:

bq. The comments in CRAR say that it can't use super.read, so is the RAR buffer
wasted?

Buffer in CRAR used to read compressed data from disk (instead of creating
separate buffer each time) and it uses RAR.buffer for decompression, so non of
the buffers is wasted.

OOM Exception during repair session with LeveledCompactionStrategy
--

Key: CASSANDRA-4142
URL: https://issues.apache.org/jira/browse/CASSANDRA-4142
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 1.0.6
Environment: OS: Linux CentOs 6
JDK: Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
Node configuration:
Quad-core
10 GB RAM
Xmx set to 2,5 GB (as computed by default).
Reporter: Romain Hardouin

We encountered an OOM Exception on 2 nodes during repair session.
Our CF are set up to use LeveledCompactionStrategy and SnappyCompressor.
These two options used together maybe the key to the problem.
Despite of setting XX:+HeapDumpOnOutOfMemoryError, no dump have been
generated.
Nonetheless a memory analysis on a live node doing a repair reveals an
hotspot: an ArrayList of SSTableBoundedScanner which appears to contain as
many objects as there are SSTables on disk.
This ArrayList consumes 786 MB of the heap space for 5757 objects. Therefore
each object is about 140 KB.
Eclipse Memory Analyzer's denominator tree shows that 99% of a
SSTableBoundedScanner object's memory is consumed by a
CompressedRandomAccessReader which contains two big byte arrays.
Cluster information:
9 nodes
Each node handles 35 GB (RandomPartitioner)
This JIRA was created following this discussion:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-so-many-SSTables-td7453033.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3997) Make SerializingCache Memory Pluggable

2012-04-02 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244151#comment-13244151
 ] 

Pavel Yaskevich commented on CASSANDRA-3997:


Howard allocator uses even more memory (~300 MB more) than standard allocator 
but jemalloc buys as ~2.5 GB which is pretty good. The last thing here would be 
to investigate what causes free() segfaults with jemalloc so different memory 
allocators could be used without any structural changes to the code... 

 Make SerializingCache Memory Pluggable
 --

 Key: CASSANDRA-3997
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3997
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Vijay
Assignee: Vijay
Priority: Minor
  Labels: cache
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-3997.patch, jna.zip


 Serializing cache uses native malloc and free by making FM pluggable, users 
 will have a choice of gcc malloc, TCMalloc or JEMalloc as needed. 
 Initial tests shows less fragmentation in JEMalloc but the only issue with it 
 is that (both TCMalloc and JEMalloc) are kind of single threaded (at-least 
 they crash in my test otherwise).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-03-31 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243139#comment-13243139
 ] 

Pavel Yaskevich commented on CASSANDRA-4093:


If users are never required to set it for everything to work, what is the 
benefit of adding new field at the first place? 

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-03-31 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243161#comment-13243161
 ] 

Pavel Yaskevich commented on CASSANDRA-4093:


Sounds like it wasn't a good time to make schema_* CFs to use CQL3 style 
metadata which breaks all other parties and causes half-hacky field (dispute 
the fact that it is not even useful yet) to be added to the thrift structure 
just to support correct data display even if that is ambiguous for users 
how/when to correctly use it...

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-03-31 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243177#comment-13243177
 ] 

Pavel Yaskevich commented on CASSANDRA-4093:


bq. Granted, which is why we need to make the cli aware of column aliases.

The problem with that it's not only about CLI it is also about all other 
possible clients too because users expect comparator to be able to 
{de-}serialize column names correctly. So we should make it very clear how to 
work with this type of situation without making any special cases (e.g. for CT).

bq. Because we added those before we had cqlsh, so the cli was the only way to 
configure them. In retrospect, not a great idea.

I also don't think that having aliases in Thrift really justifies this 
situation.

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-03-31 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243200#comment-13243200
 ] 

Pavel Yaskevich commented on CASSANDRA-4093:


I'm fine fixing it just for CLI then.

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-03-30 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242835#comment-13242835
 ] 

Pavel Yaskevich commented on CASSANDRA-4093:


Would users be required to manually set component_index at every metadata 
column they have after update to 1.1 or to rephrase that - when users are going 
to be *required* to set composite_index field?

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-03-29 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241503#comment-13241503
 ] 

Pavel Yaskevich commented on CASSANDRA-4093:


I don't get it, if the column names in column_metadata were not serialized 
using given CF comparator shouldn't we fix that instead of changing CLI?

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-03-29 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241587#comment-13241587
 ] 

Pavel Yaskevich commented on CASSANDRA-4093:


I confirm that CLI works as expected with the patch but I'm not sure that 
adding new option to the ColumnDef especially because users can simply ignore 
it without even given it a thought. Feels like we are adding complexity from 
pure air...

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-03-29 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241913#comment-13241913
 ] 

Pavel Yaskevich commented on CASSANDRA-4093:


I like that more than adding component_index option which would be easily 
misconfigured and create a confusion. Right now we should change comparators of 
the schema_* CFs for that to work and add a check to the ThriftValidation and 
CQL validations to prevent users from setting CompositeType comparators to the 
new CFs along side with check when schema is converted to 1.1...

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3612) CQL inserting blank key.

2012-03-28 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240531#comment-13240531
 ] 

Pavel Yaskevich commented on CASSANDRA-3612:


+1

 CQL inserting blank key.
 

 Key: CASSANDRA-3612
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3612
 Project: Cassandra
  Issue Type: Bug
  Components: API
Affects Versions: 1.0.0
 Environment: Linux ubuntu 3.0.0-12-generic #20-Ubuntu SMP Fri Oct 7 
 14:56:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Reporter: samal
Assignee: paul cannon
Priority: Minor
  Labels: cql
 Fix For: 1.0.9


 One of our application bug inserted blank key into cluster causing assertion 
 error on key. After checking the root cause, I found it is the bug with CQL 
 and reproducible. Client cassandra-node and cqlsh-1.0.6.
 Blank key only work when one column provided.
  
 {}
 cqlsh insert into login (KEY,email)values('','');
 cqlsh select * from login;
 u'' | u'email',u'' 
 cqlsh insert into login (KEY,email,verified)values('','','');
 Request did not complete within rpc_timeout.
 cqlsh insert into login (KEY,verified)values('','');
 Request did not complete within rpc_timeout.
 cqlsh insert into login (KEY,email)values('','');
 cqlsh 
 cqlsh select * from login;
 u'' | u'email',u'' | u'uid',None
 cqlsh select * from login;
 u'' | u'email',u'' | u'uid',None
 cqlsh select * from login;
 u'' | u'email',u'' | u'uid',None
 cqlsh 
 cqlsh select * from login;
 u'' | u'email',u'' | u'uid',None
 u'samalgo...@gmail.com' | u'email',u'samalgo...@gmail.com' | 
 u'password',u'388ad1c312a488ee9e12998fe097f2258fa8d5ee' | 
 u'uid',UUID('05ea41dc-241f-11e1-8521-3da59237b189') | u'verified',u'0'
 cqlsh quit;
 {/}
 http://pastebin.com/HJn5fHhH

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4042) add caching to CQL CF options

2012-03-27 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239525#comment-13239525
 ] 

Pavel Yaskevich commented on CASSANDRA-4042:


+1

 add caching to CQL CF options
 ---

 Key: CASSANDRA-4042
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4042
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Pavel Yaskevich
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.1.0

 Attachments: 4042_v2.txt, CASSANDRA-4042.patch


 Caching option is missing from CQL ColumnFamily options.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4087) Improve out-of-the-box cache settings

2012-03-27 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239850#comment-13239850
 ] 

Pavel Yaskevich commented on CASSANDRA-4087:


It's not really necessary because default value to key_cache_size_in_mb is 
set in the Config class to auto so there is not chance it would be NPE.

 Improve out-of-the-box cache settings
 -

 Key: CASSANDRA-4087
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4087
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.1.0

 Attachments: CASSANDRA-4087.patch


 The default key cache of 2MB is significantly smaller than = 1.0 (200 rows 
 per CF) and much smaller than most production uses.  How about min(5% of the 
 heap, 100MB)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4087) Improve out-of-the-box cache settings

2012-03-27 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239860#comment-13239860
 ] 

Pavel Yaskevich commented on CASSANDRA-4087:


Hm, I didn't know that, thanks!

 Improve out-of-the-box cache settings
 -

 Key: CASSANDRA-4087
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4087
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.1.0

 Attachments: CASSANDRA-4087.patch


 The default key cache of 2MB is significantly smaller than = 1.0 (200 rows 
 per CF) and much smaller than most production uses.  How about min(5% of the 
 heap, 100MB)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) cli - show keyspaces throws exception (and swallows) on column family schema_columnfamilies

2012-03-27 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239956#comment-13239956
 ] 

Pavel Yaskevich commented on CASSANDRA-4093:


Can you please provide a simple test-case for this problem? It doesn't seem to 
be CLI related but rather CompositeType related problem...

 cli - show keyspaces throws exception (and swallows) on column family 
 schema_columnfamilies
 

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Dave Brosius
Assignee: Pavel Yaskevich
Priority: Trivial

 the CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3997) Make SerializingCache Memory Pluggable

2012-03-26 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238524#comment-13238524
 ] 

Pavel Yaskevich commented on CASSANDRA-3997:


Vijay, can you please also test Hoard Memory Allocator (http://www.hoard.org/) 
as a comparison to jemalloc?

 Make SerializingCache Memory Pluggable
 --

 Key: CASSANDRA-3997
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3997
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Vijay
Assignee: Vijay
Priority: Minor
  Labels: cache
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-3997.patch, jna.zip


 Serializing cache uses native malloc and free by making FM pluggable, users 
 will have a choice of gcc malloc, TCMalloc or JEMalloc as needed. 
 Initial tests shows less fragmentation in JEMalloc but the only issue with it 
 is that (both TCMalloc and JEMalloc) are kind of single threaded (at-least 
 they crash in my test otherwise).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4042) add caching to CQL CF options

2012-03-22 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235976#comment-13235976
 ] 

Pavel Yaskevich commented on CASSANDRA-4042:


I'm fine you just attach the bloom_filter_fp_change addition here as a 
separate patch and we will just close CASSANDRA-3941 as Duplicate or vice versa 
weather you prefer. :)

 add caching to CQL CF options
 ---

 Key: CASSANDRA-4042
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4042
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.1.0

 Attachments: CASSANDRA-4042.patch


 Caching option is missing from CQL ColumnFamily options.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4017) Unify migrations

2012-03-16 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231612#comment-13231612
 ] 

Pavel Yaskevich commented on CASSANDRA-4017:


+1

 Unify migrations
 

 Key: CASSANDRA-4017
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4017
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.1.0
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
 Fix For: 1.1.0


 Now that we can send a schema as a RowMutation, there's no need to keep 
 separate add/drop/update migration classes around.  Let's just send the 
 schema to our counterparts and let them figure out what changed.  Currently 
 we have figure out what changed code to both generate migrations on the 
 sender, and for application on the target, which adds complexity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3

2012-03-15 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230667#comment-13230667
 ] 

Pavel Yaskevich commented on CASSANDRA-2975:


Vijay, can you please rebase?

 Upgrade MurmurHash to version 3
 ---

 Key: CASSANDRA-2975
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brian Lindauer
Assignee: Vijay
Priority: Trivial
  Labels: lhf
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-2975.patch, 
 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 
 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch, 
 Murmur3Benchmark.java


 MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
 and increased robustness over version 2, which is implemented in Cassandra. 
 Information here:
 http://code.google.com/p/smhasher/
 The reference implementation is here:
 http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136
 I have already done the work to port the (public domain) reference 
 implementation to Java in the MurmurHash class and updated the BloomFilter 
 class to use the new implementation:
 https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
 Apart from the faster hash time, the new version only requires one call to 
 hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3954) Exceptions during start up after schema disagreement

2012-03-14 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229133#comment-13229133
 ] 

Pavel Yaskevich commented on CASSANDRA-3954:


Mariusz, can you please provide additional information - do you use secondary 
indexes and what is the SSTable name it fails to open (it should be on top of 
the exception Opening sstable-name (n bytes))?  My current guess is that 
it could be related to the secondary indexes.

 Exceptions during start up after schema disagreement
 

 Key: CASSANDRA-3954
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3954
 Project: Cassandra
  Issue Type: Bug
Reporter: Mariusz
Assignee: Pavel Yaskevich

 Hi,
 i`ve got schema disaggreement after dropping down keyspace,
 i`ve switched off one nodes in cluster, after starting i`ve got bunch of 
 these exceptions:
 {code}
 ERROR [SSTableBatchOpen:1] 2012-02-24 14:21:00,759 
 AbstractCassandraDaemon.java (line 134) Fatal exception in thread 
 Thread[SSTableBatchOpen:1,5,main]
 java.lang.ClassCastException: java.math.BigInteger cannot be cast to 
 java.nio.ByteBuffer
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:27)
 at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:45)
 at 
 org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:89)
 at 
 org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38)
 at java.util.TreeMap.getEntry(TreeMap.java:328)
 at java.util.TreeMap.containsKey(TreeMap.java:209)
 at java.util.TreeSet.contains(TreeSet.java:217)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:393)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:189)
 at 
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:227)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 and this one on the end of start up:
 {code}
 ERROR [MigrationStage:1] 2012-02-24 14:37:22,750 AbstractCassandraDaemon.java 
 (line 134) Fatal exception in thread Thread[MigrationStage:1,5,main]
 java.lang.NullPointerException
 at 
 org.apache.cassandra.db.migration.MigrationHelper.addColumnFamily(MigrationHelper.java:282)
 at 
 org.apache.cassandra.db.migration.MigrationHelper.addColumnFamily(MigrationHelper.java:216)
 at 
 org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:330)
 at 
 org.apache.cassandra.db.DefsTable.mergeRemoteSchema(DefsTable.java:240)
 at 
 org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:124)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Any ideas why they`ve appeared?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3996) Keys index skips results

2012-03-13 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228431#comment-13228431
 ] 

Pavel Yaskevich commented on CASSANDRA-3996:


yeah, this is a real thing. Dmitry, can you work on the test-case? I have 
already committed it to 1.1.1, should I revert?

 Keys index skips results
 

 Key: CASSANDRA-3996
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3996
 Project: Cassandra
  Issue Type: Bug
Reporter: Dmitry Petrashko
 Fix For: 1.1.1

 Attachments: KeysSearcher_fix_and_refactor-v2.patch, 
 KeysSearcher_fix_and_refactor.patch


 While scanning results page if range index meets result already seen in 
 previous result set it decreases columnsRead that causes next iteration to 
 treat columsReadrowsPerQuery as if last page was not full and scan is done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3985) Ensure a directory is selected for Compaction

2012-03-07 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224484#comment-13224484
 ] 

Pavel Yaskevich commented on CASSANDRA-3985:


Question - what is the reason why we only return true from ensureFreeSpace 
if action is not user defined?

Few styling issues:

{code}
public synchronized static String getDataFileLocationForTable(String table, 
long expectedCompactedFileSize,
  boolean 
ensureFreeSpace )
{code}

should be changed to 

{code}
public synchronized static String getDataFileLocationForTable(String table, 
  long 
expectedCompactedFileSize,
  boolean 
ensureFreeSpace)
{code}

or all arguments written on the same line.

Also we don't use spaces to delimit operands e.g.

{code}
for ( int i = 0 ; i  dataDirectoryForTable.length ; i++ )
{code}

I can see those styling problems inside of getDataFileLocationForTable(...) 
method.


 Ensure a directory is selected for Compaction
 -

 Key: CASSANDRA-3985
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3985
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: cassandra-1.0-3985.txt


 From http://www.mail-archive.com/user@cassandra.apache.org/msg20757.html
 CompactionTask.execute() checks if there is a valid compactionFileLocation 
 only if partialCompactionsAcceptable() . upgradesstables results in a 
 CompactionTask with userdefined set, so the valid location check is not 
 performed. 
 The result is a NPE, partial stack 
 {code:java}
 $ nodetool -h localhost upgradesstables
 Error occured while upgrading the sstables for keyspace MyKeySpace
 java.util.concurrent.ExecutionException: java.lang.NullPointerException
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
 at java.util.concurrent.FutureTask.get(FutureTask.java:83)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:203)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:219)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:995)
 at 
 org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:1648)
 snip
 Caused by: java.lang.NullPointerException
 at java.io.File.init(File.java:222)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTempSSTablePath(ColumnFamilyStore.java:641)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTempSSTablePath(ColumnFamilyStore.java:652)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:1888)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:151)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:229)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:182)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 {code}
 (night time here, will fix tomorrow, anyone else feel free to fix it.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3950) support trickling fsync() on writes

2012-02-29 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219012#comment-13219012
 ] 

Pavel Yaskevich commented on CASSANDRA-3950:


+1 with final change required: default 100 should be changed to 10240 
(which is 10 MB value in kilobytes, it will be multiplied by 1024 in the SW 
constructor which would give 10MB value in bytes = 10485760). 

 support trickling fsync() on writes
 ---

 Key: CASSANDRA-3950
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3950
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Peter Schuller
 Fix For: 1.1.0

 Attachments: CASSANDRA-3950-1.1-v2.txt, CASSANDRA-3950-1.1-v3.txt, 
 CASSANDRA-3950-1.1.txt


 Attaching a patch to support fsync():ing every N megabytes of data written 
 using sequential writers. The motivation is to avoid the kernel flushing out 
 pages in bulk.
 It makes sense for both platters and SSD:s, but it's particularly good for 
 SSD:s because the negative consequences of fsync():ing more often are much 
 more limited than with platters, and the *need* is to some extent greater 
 because of the fact that with SSD:s you're much more likely to be e.g. 
 streaming data quickly or compacting quickly, since you're not having to 
 throttle everything as extremely as with platters, and you easily write fast 
 enough for this to be a problem if you're targetting good latency at the 
 outliers.
 I'm nominating it for 1.1.0 because, if disabled, the probability of this 
 being a regression seems very low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3948) rename RandomAccessReader.MAX_BYTES_IN_PAGE_CACHE

2012-02-29 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219052#comment-13219052
 ] 

Pavel Yaskevich commented on CASSANDRA-3948:


+1

 rename RandomAccessReader.MAX_BYTES_IN_PAGE_CACHE
 -

 Key: CASSANDRA-3948
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3948
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Peter Schuller
Assignee: Pavel Yaskevich
 Fix For: 1.1.0

 Attachments: CASSANDRA-3948-trunk.txt


 This should make the fadvising useless (mostly). See CASSANDRA-1470 for why, 
 including links to kernel source. I have not investigated the history of when 
 this broke or whether it was like from the beginning.
 For the record I have not confirmed this by testing, only by code inspection. 
 I happened to notice it working on other things, so there is some chance that 
 I'm just mis-reading the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3980) Cli should be able to define CompositeType comparators

2012-02-29 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219259#comment-13219259
 ] 

Pavel Yaskevich commented on CASSANDRA-3980:


And you can't really use just a CompositeType it returns 'Invalid definition 
for comparator org.apache.cassandra.db.marshal.CompositeType'

 Cli should be able to define CompositeType comparators
 --

 Key: CASSANDRA-3980
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3980
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.0.9


 There is currently no way to define, for instance, 
 CompositeType(UTF8Type,Int32Type) in a CF definition.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3980) Cli should be able to define CompositeType comparators

2012-02-29 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219277#comment-13219277
 ] 

Pavel Yaskevich commented on CASSANDRA-3980:


huh, if it works like that I think we can simply document that.

 Cli should be able to define CompositeType comparators
 --

 Key: CASSANDRA-3980
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3980
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.0.9


 There is currently no way to define, for instance, 
 CompositeType(UTF8Type,Int32Type) in a CF definition.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3722) Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.

2012-02-29 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219639#comment-13219639
]

Pavel Yaskevich commented on CASSANDRA-3722:

Also maintaining a kind of normalized load statistics for each node in the
combination with pending requests could give a better view of what is going on
on the node e.g. load is = 0.5 but we have a big pending queue for/on the node
- that could mean network failure. For the statistic we can assign each of the
sub-routines load impact factor e.g. compaction 0.3, scrub - 0.2, read -
0.01, we can set the load threshold for overloaded nodes e.g. 0.85 (which
could be adjusted at runtime) and sort hosts accordingly to their load +
pending request statistics which would make penalizing hosts more precise.
Obviously some normalization should be done because clusters won't always have
nodes with identical processing capabilities (network, hardware etc.).

Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.
--

Key: CASSANDRA-3722
URL: https://issues.apache.org/jira/browse/CASSANDRA-3722
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 1.1.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor

Currently Dynamic snitch looks at the latency for figuring out which node
will be better serving the requests, this works great but there is a part of
the traffic sent to collect this data... There is also a window when Snitch
doesn't know about some major event which are going to happen on the node
(Node which is going to receive the data request).
It would be great if we can send some sort hints to the Snitch so they can
score based on known events causing higher latencies.

[jira] [Commented] (CASSANDRA-3294) a node whose TCP connection is not up should be considered down for the purpose of reads and writes

2012-02-27 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217196#comment-13217196
]

Pavel Yaskevich commented on CASSANDRA-3294:

How about we assign probability to be alive to each of the nodes in the ring
(starting from uniform distribution) and with each of the failures e.g.
RPC/Gossiper communication error we would decrease probability of node being
alive by constant factor and increase by other constant factor if communication
was successful. That would allow us to calculate the endpoint with the highest
alive (and all other sorted) probability for sub-group of
SS.getLiveNaturalEndpoints(String, RingPosition), what do you think?

a node whose TCP connection is not up should be considered down for the
purpose of reads and writes
---

Key: CASSANDRA-3294
URL: https://issues.apache.org/jira/browse/CASSANDRA-3294
Project: Cassandra
Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller

Cassandra fails to handle the most simple of cases intelligently - a process
gets killed and the TCP connection dies. I cannot see a good reason to wait
for a bunch of RPC timeouts and thousands of hung requests to realize that we
shouldn't be sending messages to a node when the only possible means of
communication is confirmed down. This is why one has to disablegossip and
wait for a while to restar a node on a busy cluster (especially without
CASSANDRA-2540 but that only helps under certain circumstances).
A more generalized approach where by one e.g. weights in the number of
currently outstanding RPC requests to a node, would likely take care of this
case as well. But until such a thing exists and works well, it seems prudent
to have the very common and controlled form of failure be handled better.
Are there difficulties I'm not seeing?
I can see that one may want to distinguish between considering something
really down (and e.g. fail a repair because it's down) from what I'm
talking about, so maybe there are different concepts (say one is currently
unreachable rather than down) being conflated. But in the specific case of
sending reads/writes to a node we *know* we cannot talk to, it seems
unnecessarily detrimental.

[jira] [Commented] (CASSANDRA-3294) a node whose TCP connection is not up should be considered down for the purpose of reads and writes

2012-02-27 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217259#comment-13217259
 ] 

Pavel Yaskevich commented on CASSANDRA-3294:


The main idea of the algorithm I have mentioned is to make sure that we always 
do operations (write/read etc.) on the nodes that have the highest probability 
to be alive determined by live traffic going there instead of passively relying 
on the failure detector.

 a node whose TCP connection is not up should be considered down for the 
 purpose of reads and writes
 ---

 Key: CASSANDRA-3294
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3294
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller

 Cassandra fails to handle the most simple of cases intelligently - a process 
 gets killed and the TCP connection dies. I cannot see a good reason to wait 
 for a bunch of RPC timeouts and thousands of hung requests to realize that we 
 shouldn't be sending messages to a node when the only possible means of 
 communication is confirmed down. This is why one has to disablegossip and 
 wait for a while to restar a node on a busy cluster (especially without 
 CASSANDRA-2540 but that only helps under certain circumstances).
 A more generalized approach where by one e.g. weights in the number of 
 currently outstanding RPC requests to a node, would likely take care of this 
 case as well. But until such a thing exists and works well, it seems prudent 
 to have the very common and controlled form of failure be handled better.
 Are there difficulties I'm not seeing?
 I can see that one may want to distinguish between considering something 
 really down (and e.g. fail a repair because it's down) from what I'm 
 talking about, so maybe there are different concepts (say one is currently 
 unreachable rather than down) being conflated. But in the specific case of 
 sending reads/writes to a node we *know* we cannot talk to, it seems 
 unnecessarily detrimental.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3962) CassandraStorage can't cast fields from a CF correctly

2012-02-27 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217271#comment-13217271
 ] 

Pavel Yaskevich commented on CASSANDRA-3962:


+1

 CassandraStorage can't cast fields from a CF correctly
 --

 Key: CASSANDRA-3962
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3962
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.0.8
 Environment: OSX 10.6.latest, Pig 0.9.2.
Reporter: Janne Jalkanen
Assignee: Brandon Williams
  Labels: hadoop, pig
 Fix For: 1.0.9, 1.1.0

 Attachments: 0001-Add-LoadCaster-to-CassandraStorage.txt, 
 0002-Compose-key-from-marshaller.txt, test.cli, test.pig


 Included scripts demonstrate the problem.  Regardless of whether the key is 
 cast as a chararray or not, the Pig scripts fail with 
 {code}
 java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be 
 cast to java.lang.String
   at 
 org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:72)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:117)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3294) a node whose TCP connection is not up should be considered down for the purpose of reads and writes

2012-02-27 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217398#comment-13217398
]

Pavel Yaskevich commented on CASSANDRA-3294:

After reading CASSANDRA-3722 it seems we can implement required logic at the
snitch level taking latency measurements into account. I think we can close
this one in favor of CASSANDRA-3722 and continue work/discussion there. What do
you think, Brandon, Peter?

a node whose TCP connection is not up should be considered down for the
purpose of reads and writes
---

Key: CASSANDRA-3294
URL: https://issues.apache.org/jira/browse/CASSANDRA-3294
Project: Cassandra
Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller

[jira] [Commented] (CASSANDRA-3953) Replace deprecated and removed CfDef and KsDef attributes in thrift spec

2012-02-27 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217612#comment-13217612
 ] 

Pavel Yaskevich commented on CASSANDRA-3953:


That does not affect new schema functions at all because those attributes won't 
be set if they are not present in serialized schema (which is fine because they 
are optional) and even if they are it will just set them :)

 Replace deprecated and removed CfDef and KsDef attributes in thrift spec
 

 Key: CASSANDRA-3953
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3953
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Affects Versions: 1.0.0
Reporter: paul cannon
Assignee: paul cannon
Priority: Minor
  Labels: thrift_protocol
 Fix For: 1.1.0


 In a discussion on irc this morning around the interface backwards 
 compatibility topic (as explained in CASSANDRA-3951), the opinion was 
 expressed that it might not hurt to provide backwards compat for c* servers 
 as well as clients.
 This could be done by adding back all CfDef and KsDef attributes that were 
 removed since thrift spec 19.0.0 (0.7.0-beta2). Namely:
 * bool CfDef.preload_row_cache (only in 0.7.0 betas; probably not necessary)
 * double CfDef.row_cache_size
 * double CfDef.key_cache_size
 * i32 CfDef.row_cache_save_period_in_seconds
 * i32 CfDef.key_cache_save_period_in_seconds
 * i32 CfDef.memtable_flush_after_mins
 * i32 CfDef.memtable_throughput_in_mb
 * double CfDef.memtable_operations_in_millions
 * string CfDef.row_cache_provider
 * i32 CfDef.row_cache_keys_to_save
 * double CfDef.merge_shards_chance
 * i32 KsDef.replication_factor
 Obviously these attributes should not be expected to have any effect when 
 used with the current version of Cassandra; they may be marked ignored, 
 unused, or deprecated or whatever, as appropriate.
 This should allow library software to be built against one thrift spec (the 
 latest) and be then expected to work (keeping all necessary attributes 
 available and usable) against any Cassandra version back to 0.7.0-beta2.
 (To really achieve this goal 100%, we should reinstate the 
 system_rename_column_family() and system_rename_keyspace() calls too, and 
 just have them raise InvalidRequestException, but they never really worked 
 anyway, so it's probably better to leave them out.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3950) support trickling fsync() on writes

2012-02-25 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216594#comment-13216594
 ] 

Pavel Yaskevich commented on CASSANDRA-3950:


No problem, Peter. Take your time.

 support trickling fsync() on writes
 ---

 Key: CASSANDRA-3950
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3950
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Peter Schuller
 Fix For: 1.1.0

 Attachments: CASSANDRA-3950-1.1-v2.txt, CASSANDRA-3950-1.1.txt


 Attaching a patch to support fsync():ing every N megabytes of data written 
 using sequential writers. The motivation is to avoid the kernel flushing out 
 pages in bulk.
 It makes sense for both platters and SSD:s, but it's particularly good for 
 SSD:s because the negative consequences of fsync():ing more often are much 
 more limited than with platters, and the *need* is to some extent greater 
 because of the fact that with SSD:s you're much more likely to be e.g. 
 streaming data quickly or compacting quickly, since you're not having to 
 throttle everything as extremely as with platters, and you easily write fast 
 enough for this to be a problem if you're targetting good latency at the 
 outliers.
 I'm nominating it for 1.1.0 because, if disabled, the probability of this 
 being a regression seems very low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3948) SequentialWriter doesn't fsync() before posix_fadvise()

2012-02-25 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216597#comment-13216597
 ] 

Pavel Yaskevich commented on CASSANDRA-3948:


Exactly, we can't really control (measure) the contents of the page cache so 
instead we just define intervals for our files when to call fadvice.

 SequentialWriter doesn't fsync() before posix_fadvise()
 ---

 Key: CASSANDRA-3948
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3948
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Peter Schuller
Assignee: Pavel Yaskevich
 Fix For: 1.1.0


 This should make the fadvising useless (mostly). See CASSANDRA-1470 for why, 
 including links to kernel source. I have not investigated the history of when 
 this broke or whether it was like from the beginning.
 For the record I have not confirmed this by testing, only by code inspection. 
 I happened to notice it working on other things, so there is some chance that 
 I'm just mis-reading the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.

2012-02-24 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215623#comment-13215623
 ] 

Pavel Yaskevich commented on CASSANDRA-2261:


Ben, are you going to work on this?

 During Compaction, Corrupt SSTables with rows that cause failures should be 
 identified and blacklisted.
 ---

 Key: CASSANDRA-2261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2261
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Benjamin Coverston
Priority: Minor
  Labels: not_a_pony
 Fix For: 1.1.1

 Attachments: 2261-v2.patch, 2261.patch


 When a compaction of a set of SSTables fails because of corruption it will 
 continue to try to compact that SSTable causing pending compactions to build 
 up.
 One way to mitigate this problem would be to log the error, then identify the 
 specific SSTable that caused the failure, subsequently blacklisting that 
 SSTable and ensuring that it is no longer included in future compactions. For 
 this we could simply store the problematic SSTable's name in memory.
 If it's not possible to identify the SSTable that caused the issue, then 
 perhaps blacklisting the (ordered) permutation of SSTables to be compacted 
 together is something that can be done to solve this problem in a more 
 general case, and avoid issues where two (or more) SSTables have trouble 
 compacting a particular row. For this option we would probably want to store 
 the lists of the bad combinations in the system table somewhere s.t. these 
 can survive a node failure (there have been a few cases where I have seen a 
 compaction cause a node failure).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3948) SequentialWriter doesn't fsync() before posix_fadvise()

2012-02-23 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214616#comment-13214616
 ] 

Pavel Yaskevich commented on CASSANDRA-3948:


We algorithm behind it - we do fadvice(fd, start_position, 0) after each 
128MB of data written, flush is done in the process of each re-buffer (which is 
each 64KB by default) so we can skip doing sync when we do fadvice() and just 
use 0 which would hint kernel so skip everything starting from 
start_position. 

 SequentialWriter doesn't fsync() before posix_fadvise()
 ---

 Key: CASSANDRA-3948
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3948
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Peter Schuller
Assignee: Pavel Yaskevich
 Fix For: 1.1.0


 This should make the fadvising useless (mostly). See CASSANDRA-1470 for why, 
 including links to kernel source. I have not investigated the history of when 
 this broke or whether it was like from the beginning.
 For the record I have not confirmed this by testing, only by code inspection. 
 I happened to notice it working on other things, so there is some chance that 
 I'm just mis-reading the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3950) support trickling fsync() on writes

2012-02-23 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214630#comment-13214630
 ] 

Pavel Yaskevich commented on CASSANDRA-3950:


The problems with your patch:

# you use bytesSinceCacheFlush instead of bytesSinceTrickleFsync in 
flushInternal() method.
# trickleFsyncByteInterval should be bumped by 1024 * 1024 in the SW 
constructor to make it bytes because bytesSinceTrickleFsync is counted in bytes 
granularity.

 support trickling fsync() on writes
 ---

 Key: CASSANDRA-3950
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3950
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Peter Schuller
 Fix For: 1.1.0

 Attachments: CASSANDRA-3950-1.1.txt


 Attaching a patch to support fsync():ing every N megabytes of data written 
 using sequential writers. The motivation is to avoid the kernel flushing out 
 pages in bulk.
 It makes sense for both platters and SSD:s, but it's particularly good for 
 SSD:s because the negative consequences of fsync():ing more often are much 
 more limited than with platters, and the *need* is to some extent greater 
 because of the fact that with SSD:s you're much more likely to be e.g. 
 streaming data quickly or compacting quickly, since you're not having to 
 throttle everything as extremely as with platters, and you easily write fast 
 enough for this to be a problem if you're targetting good latency at the 
 outliers.
 I'm nominating it for 1.1.0 because, if disabled, the probability of this 
 being a regression seems very low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

2012-02-23 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214634#comment-13214634
 ] 

Pavel Yaskevich commented on CASSANDRA-3884:


The answer to your main question is - compaction, everything works well when 
you add new or modify columns but when you e.g. delete cf columns from keyspace 
and compaction kicks in before you grabbed the whole schema that schema will be 
missing updates for that columns so they won't be pushed to the remote nodes 
leaving cf attributes in their schema_columnfamilies.

bq. In MigrationHelper if withSchemaRecord is false the mutations will be null, 
and most function will return a list containing null. We should return an empty 
list instead or null (but in that last case, Migration.apply() should deal with 
null). Also MigrationHelper.dropColumnFamily() directly return null, so we 
should make it match whatever we do for the other method

Sure, I will make it return Collections.singleton()

bq. It's slightly more efficient to use Collections.singleton() than 
Arrays.asList with one element.

Sure, will change it in updated v2.

bq. Why does the tests now need to start gossip?

I have experienced gossip related NPE exceptions (in isEnabled() method for 
example) in the MM.passiveAnnounce method when Gossiper wasn't started.

 Intermittent SchemaDisagreementException
 

 Key: CASSANDRA-3884
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: using ccm on ubuntu. 
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich
 Fix For: 1.1.0

 Attachments: CASSANDRA-3884.patch


 Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and 
 column families, and then make several schema changes. Everything is being 
 done through only one of the nodes.  About once every 10 times (on my setup) 
 I get a SchemaDisagreementException when creating and dropping keyspaces. 
 There is a dtest for this: schema_changes_test.py. If your environment 
 behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3950) support trickling fsync() on writes

2012-02-23 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215041#comment-13215041
 ] 

Pavel Yaskevich commented on CASSANDRA-3950:


few last nits:

# inside of your if condition you still use bytesSinceCacheFlush instead of 
bytesSinceTrickleFsync to set it to 0.
# Requires JNA. can be removed from comment in conf/cassandra.yaml since 
build-in getFD().sync() is used.

 support trickling fsync() on writes
 ---

 Key: CASSANDRA-3950
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3950
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Peter Schuller
 Fix For: 1.1.0

 Attachments: CASSANDRA-3950-1.1-v2.txt, CASSANDRA-3950-1.1.txt


 Attaching a patch to support fsync():ing every N megabytes of data written 
 using sequential writers. The motivation is to avoid the kernel flushing out 
 pages in bulk.
 It makes sense for both platters and SSD:s, but it's particularly good for 
 SSD:s because the negative consequences of fsync():ing more often are much 
 more limited than with platters, and the *need* is to some extent greater 
 because of the fact that with SSD:s you're much more likely to be e.g. 
 streaming data quickly or compacting quickly, since you're not having to 
 throttle everything as extremely as with platters, and you easily write fast 
 enough for this to be a problem if you're targetting good latency at the 
 outliers.
 I'm nominating it for 1.1.0 because, if disabled, the probability of this 
 being a regression seems very low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3012) cassandra-cli list CF limit number of columns returned

2012-02-23 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215180#comment-13215180
 ] 

Pavel Yaskevich commented on CASSANDRA-3012:


How about syntax like 'list cf[...] limit X (columns Y)?' that will allow to 
just skip using 'columns Y' block when you want to return all columns and make 
syntax more intuitive because 'limit 10, 30' could be treated as limit min 10 
max 30 rows?...

 cassandra-cli list CF limit number of columns returned
 

 Key: CASSANDRA-3012
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3012
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.7
Reporter: Aaron Turner
Assignee: Dave Brosius
Priority: Minor
  Labels: cli, limit
 Fix For: 1.1.1

 Attachments: cli_list_columncount.diff


 Right now in the CLI, running: list MyColumnFamily; will return the first 100 
 rows and quite easily the bazillion columns associated with those rows.  
 Often times you're interested in just the row keys in the CF and less 
 interested in all the columns or perhaps only a subset of columns.
 Hence it would be nice to have the limit option take a second, optional 
 parameter limiting the number of columns to return:
 list MyCF[startkey:] limit 10, 30;
 would limit the columns per row to 30 while limiting the number of rows to 10 
 and starting at key startkey.  It should also take values 0 (no columns) 
 and -1 (all columns).  -1 should also be acceptable as a row limit also 
 denoting all rows rather then requiring the user to type a large positive 
 number.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3931) gossipers notion of schema differs from reality as reported by the nodes in question

2012-02-21 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212821#comment-13212821
 ] 

Pavel Yaskevich commented on CASSANDRA-3931:


+1 with nit: change in Migration is unnecessary because passiveAnnounce get's 
called as part of Migration.announce() routine so we don't need to change 
apply() behavior.

 gossipers notion of schema differs from reality as reported by the nodes in 
 question
 

 Key: CASSANDRA-3931
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3931
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
Assignee: Brandon Williams
 Fix For: 1.1.0

 Attachments: 3931-v2.txt, 3931.txt


 On a 1.1 cluster we happened to notice that {{nodetool gossipinfo | grep 
 SCHEMA}} reported disagreement:
 {code}
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:b0d7bab7-c13c-37d9-9adb-8ab8a5b7215d
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:bcdbd318-82df-3518-89e3-6b72227b3f66
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:bcdbd318-82df-3518-89e3-6b72227b3f66
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
 {code}
 However, the result of a thrift {{describe_ring}} on the cluster claims they 
 all agree and that {{b0d7bab7-c13c-37d9-9adb-8ab8a5b7215d}} is the schema 
 they have.
 The schemas seem to actually propagate; e.g. dropping a keyspace actually 
 drops the keyspace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3932) schema IAE and read path NPE after cluster re-deploy

2012-02-20 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212040#comment-13212040
 ] 

Pavel Yaskevich commented on CASSANDRA-3932:


can you please attach logs from nodes involved so we can see what happend 
during the schema propagation?

 schema IAE and read path NPE after cluster re-deploy
 

 Key: CASSANDRA-3932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3932
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Peter Schuller
Assignee: Pavel Yaskevich
 Fix For: 1.1.0


 On the same cluster (but later) as the one where we observed CASSANDRA-3931 
 we were running some performance/latency testing. ycsb reads, plus a separate 
 little python client. All was fine.
 I then did a fast re-deploy for changed GC settings, which would have let to 
 a complete cluster restart almost simultaneously (triggering races?). When I 
 re-ran my Python client, I suddenly got an error saying Keyspace1 did not 
 exist. On re-run I started getting timeouts. Looking at the endpoints of the 
 key that I was getting a timeout for, the first error ever seen is:
 {code}
 java.lang.IllegalArgumentException: Unknown ColumnFamily Standard1 in 
 keyspace Keyspace1
 at org.apache.cassandra.config.Schema.getComparator(Schema.java:234)
 at 
 org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:312)
 at 
 org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:94)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.init(SliceByNamesReadCommand.java:44)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:113)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:81)
 at 
 org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:134)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:53)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 And later in the read path NPE:s like these:
 {code}
 java.lang.NullPointerException
 at 
 org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:321)
 at org.apache.cassandra.db.Table.init(Table.java:277)
 at org.apache.cassandra.db.Table.open(Table.java:120)
 at org.apache.cassandra.db.Table.open(Table.java:103)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:54)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3815) Alllow compression setting adjustment via JMX

2012-02-17 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210145#comment-13210145
 ] 

Pavel Yaskevich commented on CASSANDRA-3815:


+1

 Alllow compression setting adjustment via JMX
 -

 Key: CASSANDRA-3815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3815
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.0

 Attachments: 3815.txt


 As the title says, let's allow enabling/disabling/setting chunk size via JMX.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3628) Make Pig/CassandraStorage delete functionality disabled by default and configurable

2012-02-17 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210181#comment-13210181
]

Pavel Yaskevich commented on CASSANDRA-3628:

+1 on v2 with nit - would be better to change the error messages to something
like
- null found, but deletes are disabled, to enable deletes use +
PIG_ALLOW_DELETES + =true
- SuperColumn deletion attempted with empty bag, but deletes are disabled, to
enable deletes use + PIG_ALLOW_DELETES + =true

Make Pig/CassandraStorage delete functionality disabled by default and
configurable
---

Key: CASSANDRA-3628
URL: https://issues.apache.org/jira/browse/CASSANDRA-3628
Project: Cassandra
Issue Type: Task
Reporter: Jeremy Hanna
Assignee: Brandon Williams
Labels: pig
Fix For: 1.0.8

Attachments: 3628-v2.txt, 3628.txt

Right now, there is a way to delete column with the CassandraStorage
loadstorefunc. In practice it is a bad idea to have that enabled by default.
A scenario: do an outer join and you don't have a value for something and
then you write out to cassandra all of the attributes of that relation.
You've just inadvertently deleted a column for all the rows that didn't have
that value as a result of the outer join. It can be argued that you want to
be careful with how you project after the join. However, I would think
disabling by default and having a configurable property to enable it for the
instances when you explicitly want to use it is the right plan.
Fwiw, we had a bug in one of our scripts that did exactly as described above.
It's good to fix the bug. It's bad to implicitly delete data.

[jira] [Commented] (CASSANDRA-3804) upgrade problems from 1.0 to trunk

2012-02-16 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209423#comment-13209423
 ] 

Pavel Yaskevich commented on CASSANDRA-3804:


I will investigate how truncate is related to the schema modifications. I don't 
think that we have any intention to patch both 1.0 and 1.1 because we don't 
want to require people to update to 1.0.8+ before upgrading to 1.1. 

 upgrade problems from 1.0 to trunk
 --

 Key: CASSANDRA-3804
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3804
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: ubuntu, cluster set up with ccm.
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich
 Fix For: 1.1.0

 Attachments: CASSANDRA-3804-1.1.patch, CASSANDRA-3804.patch, 
 node1.log, node2.log


 A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only 
 one node is taken down, upgraded to trunk, and started again. An rpc timeout 
 exception happens if counter-add operations are done. It usually takes 
 between 1 and 500 add operations before the failure occurs. The failure seems 
 to happen sooner if the coordinator node is NOT the one that was upgraded. 
 Here is the error: 
 {code}
 ==
 ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
 --
 Traceback (most recent call last):
   File /usr/lib/pymodules/python2.7/nose/case.py, line 187, in runTest
 self.test(*self.arg)
   File /home/tahooie/cassandra-dtest/counter_upgrade_test.py, line 50, in 
 counter_upgrade_test
 cursor.execute(UPDATE counters SET row = row+1 where key='a')
   File /usr/local/lib/python2.7/dist-packages/cql/cursor.py, line 96, in 
 execute
 raise cql.OperationalError(Request did not complete within rpc_timeout.)
 OperationalError: Request did not complete within rpc_timeout.
 {code}
 A script has been added to cassandra-dtest (counter_upgrade_test.py) to 
 demonstrate the failure. The newest version of CCM is required to run the 
 test. It is available here if it hasn't yet been pulled: 
 g...@github.com:tpatterson/ccm.git

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3804) upgrade problems from 1.0 to trunk

2012-02-16 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209468#comment-13209468
 ] 

Pavel Yaskevich commented on CASSANDRA-3804:


if you take a look at the comment from Jonathan at 13/Feb/12 23:34 it will 
give you a better understanding why there are both 1.0 and 1.1 patches.

 upgrade problems from 1.0 to trunk
 --

 Key: CASSANDRA-3804
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3804
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: ubuntu, cluster set up with ccm.
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich
 Fix For: 1.1.0

 Attachments: CASSANDRA-3804-1.1.patch, CASSANDRA-3804.patch, 
 node1.log, node2.log


 A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only 
 one node is taken down, upgraded to trunk, and started again. An rpc timeout 
 exception happens if counter-add operations are done. It usually takes 
 between 1 and 500 add operations before the failure occurs. The failure seems 
 to happen sooner if the coordinator node is NOT the one that was upgraded. 
 Here is the error: 
 {code}
 ==
 ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
 --
 Traceback (most recent call last):
   File /usr/lib/pymodules/python2.7/nose/case.py, line 187, in runTest
 self.test(*self.arg)
   File /home/tahooie/cassandra-dtest/counter_upgrade_test.py, line 50, in 
 counter_upgrade_test
 cursor.execute(UPDATE counters SET row = row+1 where key='a')
   File /usr/local/lib/python2.7/dist-packages/cql/cursor.py, line 96, in 
 execute
 raise cql.OperationalError(Request did not complete within rpc_timeout.)
 OperationalError: Request did not complete within rpc_timeout.
 {code}
 A script has been added to cassandra-dtest (counter_upgrade_test.py) to 
 demonstrate the failure. The newest version of CCM is required to run the 
 test. It is available here if it hasn't yet been pulled: 
 g...@github.com:tpatterson/ccm.git

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3886) Pig can't store some types after loading them

2012-02-13 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207192#comment-13207192
 ] 

Pavel Yaskevich commented on CASSANDRA-3886:


+1

 Pig can't store some types after loading them
 -

 Key: CASSANDRA-3886
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3886
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.7
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.0.8

 Attachments: 3886.txt


 In CASSANDRA-2810, we removed the decompose methods in putNext instead 
 relying on objToBB, however it cannot sufficiently handle all types.  For 
 instance, if longs are loaded and then an attempt to store them is made, this 
 causes a cast exception: java.io.IOException: java.io.IOException: 
 java.lang.ClassCastException: java.lang.Long cannot be cast to 
 org.apache.pig.data.DataByteArray Output must be (key, {(column,value)...}) 
 for ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for 
 SuperColumnFamily

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3804) upgrade problems from 1.0 to trunk

2012-02-13 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207347#comment-13207347
 ] 

Pavel Yaskevich commented on CASSANDRA-3804:


We can do that. java.lang.UnsupportedOperationException: Not a time-based 
UUID happens on 1.0 nodes so I thought that it would be appropriate to fix it 
there also.

 upgrade problems from 1.0 to trunk
 --

 Key: CASSANDRA-3804
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3804
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
 Environment: ubuntu, cluster set up with ccm.
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich
 Fix For: 1.0.8

 Attachments: CASSANDRA-3804.patch


 A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only 
 one node is taken down, upgraded to trunk, and started again. An rpc timeout 
 exception happens if counter-add operations are done. It usually takes 
 between 1 and 500 add operations before the failure occurs. The failure seems 
 to happen sooner if the coordinator node is NOT the one that was upgraded. 
 Here is the error: 
 {code}
 ==
 ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
 --
 Traceback (most recent call last):
   File /usr/lib/pymodules/python2.7/nose/case.py, line 187, in runTest
 self.test(*self.arg)
   File /home/tahooie/cassandra-dtest/counter_upgrade_test.py, line 50, in 
 counter_upgrade_test
 cursor.execute(UPDATE counters SET row = row+1 where key='a')
   File /usr/local/lib/python2.7/dist-packages/cql/cursor.py, line 96, in 
 execute
 raise cql.OperationalError(Request did not complete within rpc_timeout.)
 OperationalError: Request did not complete within rpc_timeout.
 {code}
 A script has been added to cassandra-dtest (counter_upgrade_test.py) to 
 demonstrate the failure. The newest version of CCM is required to run the 
 test. It is available here if it hasn't yet been pulled: 
 g...@github.com:tpatterson/ccm.git

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

2012-02-11 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206087#comment-13206087
]

Pavel Yaskevich commented on CASSANDRA-3884:

The first rule of the schema migration (which you have violated) is to make
sure that cluster is reasonably quiet for KS/CF you do updates upon because
different bad things can happen if you migrate under heavy load. Can you please
attach to the ticket system.log from both nodes in the situation when you get
SchemaDisagreementException?

Intermittent SchemaDisagreementException

Key: CASSANDRA-3884
URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.1.0
Environment: using ccm on ubuntu.
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich

Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and
column families, and then make several schema changes. Everything is being
done through only one of the nodes. About once every 10 times (on my setup)
I get a SchemaDisagreementException when creating and dropping keyspaces.
There is a dtest for this: schema_changes_test.py. If your environment
behaves like mine, you might need to run it 10 times to get the error.

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

2012-02-11 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206210#comment-13206210
]

Pavel Yaskevich commented on CASSANDRA-3884:

bq. That sounds buggy to me, the goal of CASSANDRA-1391 was to make it so you
don't have to care about that kind of thing anymore.

There is still no way to make migrations atomic so the same rule applies, we
made possible to concurrent schema propagation with CASSANDRA-1391 so we don't
need to worry about ordering (or time) of the changes but what happens when you
update keyspace/column_family simultaneously doing heavy write/read is still
unpredictable because that would require some sort of global lock while
KSMetaData/CFMetaData are mutated.

Intermittent SchemaDisagreementException

[jira] [Commented] (CASSANDRA-3884) Intermittent SchemaDisagreementException

2012-02-11 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206260#comment-13206260
 ] 

Pavel Yaskevich commented on CASSANDRA-3884:


Yes, this is what I mean and all existing CFs not involved in schema mutation 
are not touched by migration process.

 Intermittent SchemaDisagreementException
 

 Key: CASSANDRA-3884
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3884
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: using ccm on ubuntu. 
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich

 Set up a cluster of two nodes (on cassandra-1.1), create some keyspaces and 
 column families, and then make several schema changes. Everything is being 
 done through only one of the nodes.  About once every 10 times (on my setup) 
 I get a SchemaDisagreementException when creating and dropping keyspaces. 
 There is a dtest for this: schema_changes_test.py. If your environment 
 behaves like mine, you might need to run it 10 times to get the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3886) Pig can't store some types after loading them

2012-02-10 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205577#comment-13205577
 ] 

Pavel Yaskevich commented on CASSANDRA-3886:


I think we should make objToBB method uniform and return in each case e.g.

{noformat}
private ByteBuffer objToBB(Object o)
{
if (o == null)
return (ByteBuffer)o;
if (o instanceof java.lang.String)
return new DataByteArray((String)o);
if (o instanceof Integer)
return IntegerType.instance.decompose((BigInteger)o);
if (o instanceof Long)
return LongType.instance.decompose((Long)o);
if (o instanceof Float)
return FloatType.instance.decompose((Float)o);
if (o instanceof Double)
return DoubleType.instance.decompose((Double)o);
if (o instanceof UUID)
return ByteBuffer.wrap(UUIDGen.decompose((UUID) o));

return null;
}
{noformat}

 Pig can't store some types after loading them
 -

 Key: CASSANDRA-3886
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3886
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.7
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.0.8

 Attachments: 3886.txt


 In CASSANDRA-2810, we removed the decompose methods in putNext instead 
 relying on objToBB, however it cannot sufficiently handle all types.  For 
 instance, if longs are loaded and then an attempt to store them is made, this 
 causes a cast exception: java.io.IOException: java.io.IOException: 
 java.lang.ClassCastException: java.lang.Long cannot be cast to 
 org.apache.pig.data.DataByteArray Output must be (key, {(column,value)...}) 
 for ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for 
 SuperColumnFamily

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3877) Make secondary indexes inherit compression (and maybe other properties) from their base CFS

2012-02-09 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204750#comment-13204750
 ] 

Pavel Yaskevich commented on CASSANDRA-3877:


+1 with one nit: reloadSecondaryIndexMetadata can be made to return CFMetadata 
(this) so instead of using variable in the CFMetaData.newIndexMetadata(...) it 
can be called in chaining style like all other methods.

{noformat}
- .gcGraceSeconds(parent.gcGraceSeconds)
- 
.minCompactionThreshold(parent.minCompactionThreshold)
- 
.maxCompactionThreshold(parent.maxCompactionThreshold);
+ .reloadSecondaryIndexMetadata(parent);
+}
+
+public CFMetaData reloadSecondaryIndexMetadata(CFMetaData parent)
+{
+gcGraceSeconds(parent.gcGraceSeconds);
+minCompactionThreshold(parent.minCompactionThreshold);
+maxCompactionThreshold(parent.maxCompactionThreshold);
+compactionStrategyClass(parent.compactionStrategyClass);
+compactionStrategyOptions(parent.compactionStrategyOptions);
+compressionParameters(parent.compressionParameters);
+
+return this;
{noformat}

 Make secondary indexes inherit compression (and maybe other properties) from 
 their base CFS
 ---

 Key: CASSANDRA-3877
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3877
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.1.0

 Attachments: 3877.patch


 Secondary indexes currently use the defaults for most properties and only 
 inherit a few properties from their base CFS (namely gc_grace and the min/max 
 compaction thresholds currently). It would make sense to have them inherit 
 more properties. At least compression makes sense, but the compaction 
 parameters probably do too (and maybe bf_filter_chance?).
 In any case, making secondary indexes inherit those probably is trivial, but 
 I think we should also make it so that if the base CF is modified, we mirror 
 the changes to it's secondary indexes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment

2012-02-08 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203863#comment-13203863
 ] 

Pavel Yaskevich commented on CASSANDRA-3623:


+1 on closing it with wontfix.

 use MMapedBuffer in CompressedSegmentedFile.getSegment
 --

 Key: CASSANDRA-3623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.1
Reporter: Vijay
Assignee: Vijay
  Labels: compression
 Fix For: 1.1

 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 
 0001-MMaped-Compression-segmented-file-v3.patch, 
 0001-MMaped-Compression-segmented-file.patch, 
 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 
 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, 
 MMappedIO-Performance.docx


 CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to 
 use the MMap and hence a higher CPU on the nodes and higher latencies on 
 reads. 
 This ticket is to implement the TODO mentioned in CompressedRandomAccessReader
 // TODO refactor this to separate concept of buffer to avoid lots of read() 
 syscalls and compression buffer
 but i think a separate class for the Buffer will be better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3847) Pig should throw a useful error when the destination CF doesn't exist

2012-02-06 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201577#comment-13201577
 ] 

Pavel Yaskevich commented on CASSANDRA-3847:


+1'ed

 Pig should throw a useful error when the destination CF doesn't exist
 -

 Key: CASSANDRA-3847
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3847
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.0
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.0.8

 Attachments: 3847.txt


 When trying to store data to nonexistent CF, no good error is returned.
 Instead you get a message like:
 {noformat}
 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2042: Error in new 
 logical plan. Try -Dpig.usenewlogicalplan=false.
 {noformat}
 Which, if you follow its advice, will eventually lead you to an NPE in 
 initSchema.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3826) Pig cannot use output formats other than CFOF

2012-02-02 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199087#comment-13199087
 ] 

Pavel Yaskevich commented on CASSANDRA-3826:


+1 with following nit:

{code}
private String getFullyQualifiedClassName(String classname)
{
String fqcn = classname.contains(.) ? classname : 
org.apache.cassandra.hadoop. + classname;
return fqcn;
}
{code}

can be changed to 

{code}
private String getFullyQualifiedClassName(String classname)
{
return classname.contains(.) ? classname : org.apache.cassandra.hadoop. 
+ classname;
}
{code}

 Pig cannot use output formats other than CFOF
 -

 Key: CASSANDRA-3826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3826
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1

 Attachments: 3826.txt


 Pig has ColumnFamilyOutputFormat hard coded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3827) nosetests / system tests fail

2012-02-01 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198162#comment-13198162
 ] 

Pavel Yaskevich commented on CASSANDRA-3827:


you can just remove key_cache_size and row_cache_size from 
ThriftTester.define_schema(self), we don't support them anymore.

 nosetests / system tests fail
 -

 Key: CASSANDRA-3827
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3827
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.1
Reporter: Michael Allen
 Fix For: 1.1


 CQL Driver version used: 1.0.8.
 {code}
 
 ==
 ERROR: system.test_thrift_server.TestMutations.test_bad_batch_calls
 --
 Traceback (most recent call last):
   File /usr/local/lib/python2.7/site-packages/nose/case.py, line 381, in 
 setUp
 try_run(self.inst, ('setup', 'setUp'))
   File /usr/local/lib/python2.7/site-packages/nose/util.py, line 478, in 
 try_run
 return func()
   File /var/lib/jenkins/jobs/Cassandra/workspace/test/system/__init__.py, 
 line 113, in setUp
 self.define_schema()
   File /var/lib/jenkins/jobs/Cassandra/workspace/test/system/__init__.py, 
 line 158, in define_schema
 Cassandra.CfDef('Keyspace1', 'Super1', column_type='Super', 
 subcomparator_type='LongType', row_cache_size=1000, key_cache_size=0),
 TypeError: __init__() got an unexpected keyword argument 'key_cache_size'
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3826) Pig cannot use output formats other than CFOF

2012-02-01 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198182#comment-13198182
 ] 

Pavel Yaskevich commented on CASSANDRA-3826:


I guess we better do `if (format.contains(.))` at the time when 
{input,output}format is set instead of getter methods? I also can suggest to 
make org.apache.cassandra.hadoop.ColumnFamilyInputFormat and 
org.apache.cassandra.hadoop.ColumnFamilyOutputFormat as DEFAULT_{INPUT, 
OUTPUT}_FORMAT and just set them to {input,output}format variables when user 
didn't give any by System.env(...), what do you think?

 Pig cannot use output formats other than CFOF
 -

 Key: CASSANDRA-3826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3826
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.0.8

 Attachments: 3826-trunk.txt, 3826.txt


 Pig has ColumnFamilyOutputFormat hard coded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3827) nosetests / system tests fail

2012-02-01 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198393#comment-13198393
 ] 

Pavel Yaskevich commented on CASSANDRA-3827:


I forgot to mention that I re-generated Python Thrift bindings using `ant 
gen-thrift-py` before running tests. This is probably what you need because as 
I can see KeyRange does have a `row_filter` parameter.

 nosetests / system tests fail
 -

 Key: CASSANDRA-3827
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3827
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.1
Reporter: Michael Allen
Assignee: Pavel Yaskevich
 Fix For: 1.1

 Attachments: CASSANDRA-3827.patch


 CQL Driver version used: 1.0.8.
 {code}
 
 ==
 ERROR: system.test_thrift_server.TestMutations.test_bad_batch_calls
 --
 Traceback (most recent call last):
   File /usr/local/lib/python2.7/site-packages/nose/case.py, line 381, in 
 setUp
 try_run(self.inst, ('setup', 'setUp'))
   File /usr/local/lib/python2.7/site-packages/nose/util.py, line 478, in 
 try_run
 return func()
   File /var/lib/jenkins/jobs/Cassandra/workspace/test/system/__init__.py, 
 line 113, in setUp
 self.define_schema()
   File /var/lib/jenkins/jobs/Cassandra/workspace/test/system/__init__.py, 
 line 158, in define_schema
 Cassandra.CfDef('Keyspace1', 'Super1', column_type='Super', 
 subcomparator_type='LongType', row_cache_size=1000, key_cache_size=0),
 TypeError: __init__() got an unexpected keyword argument 'key_cache_size'
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3827) nosetests / system tests fail

2012-02-01 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198438#comment-13198438
 ] 

Pavel Yaskevich commented on CASSANDRA-3827:


Great! Let's wait up what Jonathan has to say about code.

 nosetests / system tests fail
 -

 Key: CASSANDRA-3827
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3827
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.1
Reporter: Michael Allen
Assignee: Pavel Yaskevich
 Fix For: 1.1

 Attachments: CASSANDRA-3827.patch


 CQL Driver version used: 1.0.8.
 {code}
 
 ==
 ERROR: system.test_thrift_server.TestMutations.test_bad_batch_calls
 --
 Traceback (most recent call last):
   File /usr/local/lib/python2.7/site-packages/nose/case.py, line 381, in 
 setUp
 try_run(self.inst, ('setup', 'setUp'))
   File /usr/local/lib/python2.7/site-packages/nose/util.py, line 478, in 
 try_run
 return func()
   File /var/lib/jenkins/jobs/Cassandra/workspace/test/system/__init__.py, 
 line 113, in setUp
 self.define_schema()
   File /var/lib/jenkins/jobs/Cassandra/workspace/test/system/__init__.py, 
 line 158, in define_schema
 Cassandra.CfDef('Keyspace1', 'Super1', column_type='Super', 
 subcomparator_type='LongType', row_cache_size=1000, key_cache_size=0),
 TypeError: __init__() got an unexpected keyword argument 'key_cache_size'
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3251) CassandraStorage uses comparator for both super column names and sub column names.

2012-01-31 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197157#comment-13197157
 ] 

Pavel Yaskevich commented on CASSANDRA-3251:


isSub = 2 lead to key_validator, it should be changed to 3. Wouldn't it be a 
good idea to set 'AbstractType comparator' instead of 'boolean isSub' in 
columnToTuple(...)?

 CassandraStorage uses comparator for both super column names and sub column 
 names.
 --

 Key: CASSANDRA-3251
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3251
 Project: Cassandra
  Issue Type: Bug
  Components: Contrib, Hadoop
Affects Versions: 0.8.6
Reporter: Dana H. P'Simer, Jr.
Assignee: Brandon Williams
  Labels: cassandra, hadoop, pig
 Fix For: 0.8.10

 Attachments: 3251-v2.txt, CASSANDRA-3251.patch


 The CassandraStorage class uses the same comparator for super and sub column 
 names.
 This is because it calls columnsToTuple recursively without any indication 
 that the subsequent call is for sub columns.  Also, the getDefaultMarshallers 
 method does not return the sub column name comparator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3804) upgrade problems from 1.0 to trunk

2012-01-30 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196141#comment-13196141
 ] 

Pavel Yaskevich commented on CASSANDRA-3804:


This was discussed with Jonathan on the process of CASSANDRA-1391, users should 
make sure that all of the nodes are updated to 1.1 before running any schema 
changes because it's impossible to apply old migrations even if we accept them 
and users will be getting exceptions from your #2 anyway.

 upgrade problems from 1.0 to trunk
 --

 Key: CASSANDRA-3804
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3804
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1
 Environment: ubuntu, cluster set up with ccm.
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich
 Fix For: 1.1


 A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only 
 one node is taken down, upgraded to trunk, and started again. An rpc timeout 
 exception happens if counter-add operations are done. It usually takes 
 between 1 and 500 add operations before the failure occurs. The failure seems 
 to happen sooner if the coordinator node is NOT the one that was upgraded. 
 Here is the error: 
 {code}
 ==
 ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
 --
 Traceback (most recent call last):
   File /usr/lib/pymodules/python2.7/nose/case.py, line 187, in runTest
 self.test(*self.arg)
   File /home/tahooie/cassandra-dtest/counter_upgrade_test.py, line 50, in 
 counter_upgrade_test
 cursor.execute(UPDATE counters SET row = row+1 where key='a')
   File /usr/local/lib/python2.7/dist-packages/cql/cursor.py, line 96, in 
 execute
 raise cql.OperationalError(Request did not complete within rpc_timeout.)
 OperationalError: Request did not complete within rpc_timeout.
 {code}
 A script has been added to cassandra-dtest (counter_upgrade_test.py) to 
 demonstrate the failure. The newest version of CCM is required to run the 
 test. It is available here if it hasn't yet been pulled: 
 g...@github.com:tpatterson/ccm.git

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3804) upgrade problems from 1.0 to trunk

2012-01-30 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196176#comment-13196176
 ] 

Pavel Yaskevich commented on CASSANDRA-3804:


This exception (taken from Sylvain's #2) explains what will happen when you 
only partially migrate:

{noformat}
ERROR [GossipStage:1] 2012-01-30 14:35:13,363 AbstractCassandraDaemon.java 
(line 139) Fatal exception in thread Thread[GossipStage:1,5,main]
java.lang.UnsupportedOperationException: Not a time-based UUID
at java.util.UUID.timestamp(UUID.java:308)
at 
org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
at 
org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
at 
org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
at 
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:849)
at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:908)
at 
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat} 

As we switched from Time-based UUID for schema versions MigrationManager on the 
old nodes will fail all the time when nodes with new schema start-up or when 
they will request migrations from it (because they see that their schema 
version is different from others). Even if we make a fix in 
MigrationManager.rectify(...) method for 1.0.x, nodes with new/old schema will 
never come to agreement because of different types of the UUID and because they 
unable to run schema mutations anymore.

 upgrade problems from 1.0 to trunk
 --

 Key: CASSANDRA-3804
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3804
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1
 Environment: ubuntu, cluster set up with ccm.
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich
 Fix For: 1.1


 A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only 
 one node is taken down, upgraded to trunk, and started again. An rpc timeout 
 exception happens if counter-add operations are done. It usually takes 
 between 1 and 500 add operations before the failure occurs. The failure seems 
 to happen sooner if the coordinator node is NOT the one that was upgraded. 
 Here is the error: 
 {code}
 ==
 ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
 --
 Traceback (most recent call last):
   File /usr/lib/pymodules/python2.7/nose/case.py, line 187, in runTest
 self.test(*self.arg)
   File /home/tahooie/cassandra-dtest/counter_upgrade_test.py, line 50, in 
 counter_upgrade_test
 cursor.execute(UPDATE counters SET row = row+1 where key='a')
   File /usr/local/lib/python2.7/dist-packages/cql/cursor.py, line 96, in 
 execute
 raise cql.OperationalError(Request did not complete within rpc_timeout.)
 OperationalError: Request did not complete within rpc_timeout.
 {code}
 A script has been added to cassandra-dtest (counter_upgrade_test.py) to 
 demonstrate the failure. The newest version of CCM is required to run the 
 test. It is available here if it hasn't yet been pulled: 
 g...@github.com:tpatterson/ccm.git

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3804) upgrade problems from 1.0 to trunk

2012-01-30 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196194#comment-13196194
 ] 

Pavel Yaskevich commented on CASSANDRA-3804:


bq. Never come to agreement is fine as long as normal reads/writes (against 
existing CFs) continue to work.

reads/writes should work against existing CFs. failure from description and 
first comment are related to the way how cassandra-dtest works because it tries 
to re-create schema for every test-case which won't work for in the mixed 
version cluster, if, for example, it was to create a ColumnFamily before 
updating one of the nodes to trunk, reads/writes to that ColumnFamily would 
still work after update even tho nodes will be in schema disagreement.

 upgrade problems from 1.0 to trunk
 --

 Key: CASSANDRA-3804
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3804
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1
 Environment: ubuntu, cluster set up with ccm.
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich
 Fix For: 1.1


 A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only 
 one node is taken down, upgraded to trunk, and started again. An rpc timeout 
 exception happens if counter-add operations are done. It usually takes 
 between 1 and 500 add operations before the failure occurs. The failure seems 
 to happen sooner if the coordinator node is NOT the one that was upgraded. 
 Here is the error: 
 {code}
 ==
 ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
 --
 Traceback (most recent call last):
   File /usr/lib/pymodules/python2.7/nose/case.py, line 187, in runTest
 self.test(*self.arg)
   File /home/tahooie/cassandra-dtest/counter_upgrade_test.py, line 50, in 
 counter_upgrade_test
 cursor.execute(UPDATE counters SET row = row+1 where key='a')
   File /usr/local/lib/python2.7/dist-packages/cql/cursor.py, line 96, in 
 execute
 raise cql.OperationalError(Request did not complete within rpc_timeout.)
 OperationalError: Request did not complete within rpc_timeout.
 {code}
 A script has been added to cassandra-dtest (counter_upgrade_test.py) to 
 demonstrate the failure. The newest version of CCM is required to run the 
 test. It is available here if it hasn't yet been pulled: 
 g...@github.com:tpatterson/ccm.git

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3794) Change ColumnFamily identifiers to be UUIDs instead of sequential Integers.

2012-01-27 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194549#comment-13194549
 ] 

Pavel Yaskevich commented on CASSANDRA-3794:


Seems like it but description of that one is ambiguous so we can equally change 
the description there and close this one or close CASSANDRA-1983 and work on 
this.

 Change ColumnFamily identifiers to be UUIDs instead of sequential Integers.
 ---

 Key: CASSANDRA-3794
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3794
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.2


 Change ColumnFamily identifiers to be UUIDs instead of sequential Integers. 
 Would be useful in the situation when nodes simultaneously trying to create 
 ColumnFamilies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3794) Change ColumnFamily identifiers to be UUIDs instead of sequential Integers.

2012-01-27 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194553#comment-13194553
 ] 

Pavel Yaskevich commented on CASSANDRA-3794:


I don't think so because CASSANDRA-1983 proposes to change the way we identify 
SSTables and this one proposes to change the way we identify ColumnFamilies.

 Change ColumnFamily identifiers to be UUIDs instead of sequential Integers.
 ---

 Key: CASSANDRA-3794
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3794
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.2


 Change ColumnFamily identifiers to be UUIDs instead of sequential Integers. 
 Would be useful in the situation when nodes simultaneously trying to create 
 ColumnFamilies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment

2012-01-26 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193896#comment-13193896
 ] 

Pavel Yaskevich commented on CASSANDRA-3623:


I'm still don't think that this is a good idea because tests don't how any 
significant improvement in performance and Java still has very limited arsenal 
of functionality to work with mmap'ed files, program doesn't have a full 
control over ByteBufferes sharing mmapp'ed memory which could lead to problems 
like CASSANDRA-3179.

By the way, Vijay, maybe you weren't dropping page cache between tests? Let's 
see what Yuki has to say.

 use MMapedBuffer in CompressedSegmentedFile.getSegment
 --

 Key: CASSANDRA-3623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.1
Reporter: Vijay
Assignee: Vijay
  Labels: compression
 Fix For: 1.1

 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 
 0001-MMaped-Compression-segmented-file-v3.patch, 
 0001-MMaped-Compression-segmented-file.patch, 
 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 
 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, 
 MMappedIO-Performance.docx


 CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to 
 use the MMap and hence a higher CPU on the nodes and higher latencies on 
 reads. 
 This ticket is to implement the TODO mentioned in CompressedRandomAccessReader
 // TODO refactor this to separate concept of buffer to avoid lots of read() 
 syscalls and compression buffer
 but i think a separate class for the Buffer will be better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-01-26 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193905#comment-13193905
 ] 

Pavel Yaskevich commented on CASSANDRA-3762:


Can you please update your comment with key cache contains x/y keys to the 
results on the current trunk but like you did for After this patch results? 
Btw, did you drop the page cache using `sync; echo 3  
/proc/sys/vm/drop_caches` before running After time patch tests?

 AutoSaving KeyCache and System load time improvements.
 --

 Key: CASSANDRA-3762
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3762
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-SavedKeyCache-load-time-improvements.patch


 CASSANDRA-2392 saves the index summary to the disk... but when we have saved 
 cache we will still scan through the index to get the data out.
 We might be able to separate this from SSTR.load and let it load the index 
 summary, once all the SST's are loaded we might be able to check the 
 bloomfilter and do a random IO on fewer Index's to populate the KeyCache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-01-26 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193936#comment-13193936
]

Pavel Yaskevich commented on CASSANDRA-3762:

bq. Why would that matter? Trunk reads the index even if there is 0/0 keys to
be loaded but for this test the comments for both the load is the same
(because all i did was kept restarting in between dropping the cache). The
above comments are a sample of multiple restarts... They are relative to my
environment.

Because even if it reads the whole index when don't know how key cache pre-load
affects performance. Because with your results we can clearly see the thing I
was talking about - after this patch, key cache pre-load directly degrades
performance the bigger it gets.

AutoSaving KeyCache and System load time improvements.
--

Key: CASSANDRA-3762
URL: https://issues.apache.org/jira/browse/CASSANDRA-3762
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
Fix For: 1.2

Attachments: 0001-SavedKeyCache-load-time-improvements.patch

CASSANDRA-2392 saves the index summary to the disk... but when we have saved
cache we will still scan through the index to get the data out.
We might be able to separate this from SSTR.load and let it load the index
summary, once all the SST's are loaded we might be able to check the
bloomfilter and do a random IO on fewer Index's to populate the KeyCache.

[jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment

2012-01-26 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193938#comment-13193938
 ] 

Pavel Yaskevich commented on CASSANDRA-3623:


bq. The attached test results was done in complete isolation to one another.

That doesn't mean that cache drop is not required tho :)

 use MMapedBuffer in CompressedSegmentedFile.getSegment
 --

 Key: CASSANDRA-3623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.1
Reporter: Vijay
Assignee: Vijay
  Labels: compression
 Fix For: 1.1

 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 
 0001-MMaped-Compression-segmented-file-v3.patch, 
 0001-MMaped-Compression-segmented-file.patch, 
 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 
 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, 
 MMappedIO-Performance.docx


 CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to 
 use the MMap and hence a higher CPU on the nodes and higher latencies on 
 reads. 
 This ticket is to implement the TODO mentioned in CompressedRandomAccessReader
 // TODO refactor this to separate concept of buffer to avoid lots of read() 
 syscalls and compression buffer
 but i think a separate class for the Buffer will be better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-01-26 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194078#comment-13194078
 ] 

Pavel Yaskevich commented on CASSANDRA-3762:


bq. If we want to see the optimal solution for all the use cases i think we 
have to go for the alternative where we can save the Keycache position to the 
disk and read it back and what ever is missing let it fault fill. Agree?

I need to think about this option.

 AutoSaving KeyCache and System load time improvements.
 --

 Key: CASSANDRA-3762
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3762
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-SavedKeyCache-load-time-improvements.patch


 CASSANDRA-2392 saves the index summary to the disk... but when we have saved 
 cache we will still scan through the index to get the data out.
 We might be able to separate this from SSTR.load and let it load the index 
 summary, once all the SST's are loaded we might be able to check the 
 bloomfilter and do a random IO on fewer Index's to populate the KeyCache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3728) Better error message when a column family creation fails

2012-01-25 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192945#comment-13192945
 ] 

Pavel Yaskevich commented on CASSANDRA-3728:


Works for me on the current cassandra-1.0 branch

{noformat}
[git:CASSANDRA-3728?] (~/work/java/git-cassandra) → ./bin/cassandra-cli --host 
localhost
crConnected to: Test Cluster on localhost/9160
eWelcome to Cassandra CLI version 1.0.7-SNAPSHOT

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] create keyspace ks;
71997dc0-473c-11e1--242d50cf1fdd
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] use ks;
Authenticated to keyspace: ks
[default@ks] create column family foo-bar;
Invalid column family name: foo-bar
[default@ks]
{noformat}

Yuki, can you please test too?

 Better error message when a column family creation fails
 

 Key: CASSANDRA-3728
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3728
 Project: Cassandra
  Issue Type: Bug
Reporter: Eric Lubow
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: cli
 Fix For: 1.0.8


 Since '-' characters are not allowed in column family names, there should be 
 an error thrown on column family name validation.
 [default@linkcurrent] create column family foo-bar;
 null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2963) Add a convenient way to reset a node's schema

2012-01-25 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192946#comment-13192946
 ] 

Pavel Yaskevich commented on CASSANDRA-2963:


Now when CASSANDRA-1391 is committed, to reset the schema you will just need to 
truncate schema_{keyspaces, columnfamilies, columns} and re-set Schema.instance 
to initial (blank) state.

 Add a convenient way to reset a node's schema
 -

 Key: CASSANDRA-2963
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2963
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Brandon Williams
Assignee: Yuki Morishita
Priority: Minor
  Labels: lhf
 Fix For: 1.1

 Attachments: 0001-Add-resetlocalschema-to-nodetool.patch, 
 system_reset_schema.txt


 People often encounter a schema disagreement where just one node is out of 
 sync.  To get it back in sync, they shutdown the node, move the Schema* and 
 Migration* files out of the system ks, and then start it back up.  Rather 
 than go through this process, it would be nice if you could just tell the 
 node to reset its schema.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3559) CFMetaData conversions to Thrift/Avro should probably be inverse one of the other

2012-01-25 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192948#comment-13192948
 ] 

Pavel Yaskevich commented on CASSANDRA-3559:


Now when CASSANDRA-1391 is committed we can close this issue because toAvro() 
methods were removed and fromAvro() methods were marked as @Deprecated so there 
is no requirement to change them ever again.

 CFMetaData conversions to Thrift/Avro should probably be inverse one of the 
 other
 -

 Key: CASSANDRA-3559
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3559
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: avro, thrift
 Fix For: 1.1

 Attachments: 3559.patch


 In other word, it would probably be a idea to have:
 {noformat}
   cfm == CFMetadata.fromThrift(cfm.toThrift())
   cfm == CFMetadata.fromAvro(cfm.toAvro())
 {noformat}
 In particular, we could have unit tests to check that, which would avoid 
 things like CASSANDRA-3558.
 It is not the case today for thrift because of the keyAlias. For some reason, 
 if the keyAlias is not set, we return with toThrift() the default alias. I 
 don't think this serves any purpose though.
 The goal of this ticket is to both fix that (unless there is a compelling 
 reason not to) and add unit tests for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3559) CFMetaData conversions to Thrift/Native schema should be inverse one of the other

2012-01-25 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192979#comment-13192979
 ] 

Pavel Yaskevich commented on CASSANDRA-3559:


+1

 CFMetaData conversions to Thrift/Native schema should be inverse one of the 
 other
 -

 Key: CASSANDRA-3559
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3559
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: avro, thrift
 Fix For: 1.1

 Attachments: 3559-v3.patch, 3559.patch, CASSANDRA-3559-v2.patch


 In other word, it would probably be a good idea to have:
 {noformat}
   cfm == CFMetadata.fromThrift(cfm.toThrift())
   cfm == CFMetadata.fromSchema(cfm.toSchema())
 {noformat}
 In particular, we could have unit tests to check that, which would avoid 
 things like CASSANDRA-3558.
 It is not the case today for thrift because of the keyAlias. For some reason, 
 if the keyAlias is not set, we return with toThrift() the default alias. I 
 don't think this serves any purpose though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.

2012-01-25 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193195#comment-13193195
 ] 

Pavel Yaskevich commented on CASSANDRA-2261:


Patch overall looks good, has some code styling issues in files 
(LeveledManifest.java, SizeTieredCompactionStrategy.java).

And when you comment logger.debug(...) in ColumnFamilyStore.java:1332 you will 
be able to see following exception (one exception for each of the 
Keyspace1-Standard1 SSTables) shown by CompactionsTest.testBlacklisting():

{nofromat}
[junit] java.lang.RuntimeException: 
SSTableScanner(file=RandomAccessReader(filePath='/Users/xedin/work/java/cassandra/build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-5-Data.db',
 skipIOCache=true) 
sstable=SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-5-Data.db')
 exhausted=false) failed to provide next columns from 
KeyScanningIterator(finishedAt:0)
[junit] at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:193)
[junit] at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:146)
[junit] at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:138)
[junit] at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
[junit] at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:149)
[junit] at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:90)
[junit] at 
org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:47)
[junit] at 
org.apache.cassandra.db.compaction.CompactionIterable.iterator(CompactionIterable.java:79)
[junit] at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:129)
[junit] at 
org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:260)
[junit] at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
[junit] at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
[junit] at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[junit] at java.util.concurrent.FutureTask.run(FutureTask.java:138)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[junit] at java.lang.Thread.run(Thread.java:680)
[junit] Caused by: java.io.EOFException
[junit] at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399)
[junit] at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377)
[junit] at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324)
[junit] at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:391)
[junit] at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:373)
[junit] at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:173)
[junit] ... 16 more
{noformat}

Could you please investigate that cases them?

 During Compaction, Corrupt SSTables with rows that cause failures should be 
 identified and blacklisted.
 ---

 Key: CASSANDRA-2261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2261
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Benjamin Coverston
Priority: Minor
  Labels: not_a_pony
 Fix For: 1.1

 Attachments: 2261-v2.patch, 2261.patch


 When a compaction of a set of SSTables fails because of corruption it will 
 continue to try to compact that SSTable causing pending compactions to build 
 up.
 One way to mitigate this problem would be to log the error, then identify the 
 specific SSTable that caused the failure, subsequently blacklisting that 
 SSTable and ensuring that it is no longer included in future compactions. For 
 this we could simply store the problematic SSTable's name in memory.
 If it's not possible to identify the SSTable that caused the issue, then 
 perhaps blacklisting the (ordered) permutation of SSTables to be compacted 
 together is something that can be done to solve this problem in a more 
 general case, and avoid issues where two (or more) SSTables have trouble 
 compacting a particular row. For this option we would probably want to store 
 the lists of the bad combinations in the system table somewhere s.t. these

[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations

2012-01-24 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192300#comment-13192300
 ] 

Pavel Yaskevich commented on CASSANDRA-1391:


Absolutely! I'm finishing up few last things and going to attach a patch in few 
hours.

 Allow Concurrent Schema Migrations
 --

 Key: CASSANDRA-1391
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Pavel Yaskevich
Priority: Critical
 Fix For: 1.1

 Attachments: 0001-CASSANDRA-1391-main.patch, 
 0002-CASSANDRA-1391-fixes.patch, 1391-rebased.txt, CASSANDRA-1391.patch


 CASSANDRA-1292 fixed multiple migrations started from the same node to 
 properly queue themselves, but it is still possible for migrations initiated 
 on different nodes to conflict and leave the cluster in a bad state. Since 
 the system_add/drop/rename methods are accessible directly from the client 
 API, they should be completely safe for concurrent use.
 It should be possible to allow for most types of concurrent migrations by 
 converting the UUID schema ID into a VersionVectorClock (as provided by 
 CASSANDRA-580).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3759) [patch] don't allow dropping the system keyspace

2012-01-24 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192411#comment-13192411
 ] 

Pavel Yaskevich commented on CASSANDRA-3759:


I have tested my patch on the new installation and it allows to create keyspace 
without already using one (which was throwing NPE before).

 [patch] don't allow dropping the system keyspace
 

 Key: CASSANDRA-3759
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3759
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dave Brosius
Assignee: Dave Brosius
Priority: Trivial
 Fix For: 1.0.8

 Attachments: CASSANDRA-3759-npe-fix.patch, no_drop_system.diff


 throw an IRE if user attempts to drop system keyspace

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-01-24 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192595#comment-13192595
]

Pavel Yaskevich commented on CASSANDRA-3762:

With this patch we trade whole sequential primary_index read for random I/O
with SSTableReader.getPosition() only for amount saved keys. Can you extend key
cache, let's make it 75% of the keys, and run your test again? I think the
closer key cache size will get to actual number of keys the worse will
performance get...

AutoSaving KeyCache and System load time improvements.
--

Attachments: 0001-SavedKeyCache-load-time-improvements.patch

[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-01-24 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192654#comment-13192654
 ] 

Pavel Yaskevich commented on CASSANDRA-3762:


I mention this because the problem in the original ticket was with rolling 
restarts taking too much time on index summary computation (read going though 
whole PrimaryIndex for every SSTable out there), so imagine situation when you 
have few hundreds of SSTables each with key cache in the different parts of the 
primary index this means if you go with getPosition() calls you will have a lot 
of random I/O (meaning you will have to seek deeper and deeper into the primary 
index file which means slower data access even in mmap mode) on each of those 
and I'm not sure if it's really better than reading primary index sequentially 
especially knowing that you have already read all of the index/data positions 
from the Summary component. I propose you do the test with many SSTables and 
compare system load times (don't forget to drop page cache between tests with 
`sync; echo 3  /proc/sys/vm/drop_caches`).

By the way, I forgot to ask you if you dropped page cache before running second 
test? if you didn't that would pretty much explain such a dramatic improvement 
in the load time...

 AutoSaving KeyCache and System load time improvements.
 --

 Key: CASSANDRA-3762
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3762
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-SavedKeyCache-load-time-improvements.patch


 CASSANDRA-2392 saves the index summary to the disk... but when we have saved 
 cache we will still scan through the index to get the data out.
 We might be able to separate this from SSTR.load and let it load the index 
 summary, once all the SST's are loaded we might be able to check the 
 bloomfilter and do a random IO on fewer Index's to populate the KeyCache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-01-24 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192668#comment-13192668
]

Pavel Yaskevich commented on CASSANDRA-3762:

bq. No it is basically a average of multiple runs Without any additional
writes... start and immediatly stop and compare the logs... both tests are the
same. Again it is on my laptop.

Which means the more you run the more data you get cached which affects the
results, I would suggest you to drop cache every time you run each of the tests
to get cleaner load time values when any I/O is involved.

AutoSaving KeyCache and System load time improvements.
--

Attachments: 0001-SavedKeyCache-load-time-improvements.patch

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-01-24 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192673#comment-13192673
 ] 

Pavel Yaskevich commented on CASSANDRA-2392:


Great, now when we have this one ready, we really need to finish up with 
CASSANDRA-3762 to find out if such design makes sense or should we go with 
another strategy.

 Saving IndexSummaries to disk
 -

 Key: CASSANDRA-2392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-2392-v6.patch, 
 0001-re-factor-first-and-last.patch, 0001-save-summaries-to-disk-v4.patch, 
 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk-v2.patch, 
 0002-save-summaries-to-disk-v3.patch, 0002-save-summaries-to-disk.patch, 
 CASSANDRA-2392-v5.patch


 For nodes with millions of keys, doing rolling restarts that take over 10 
 minutes per node can be painful if you have 100 node cluster. All of our time 
 is spent on doing index summary computations on startup. It would be great if 
 we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3759) [patch] don't allow dropping the system keyspace

2012-01-24 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192696#comment-13192696
 ] 

Pavel Yaskevich commented on CASSANDRA-3759:


No, users won't be able to drop or do any modifications to system keyspace 
using CQL or Thrift interfaces but they will be able to create/update/drop 
other keyspaces when system keyspace is used.

 [patch] don't allow dropping the system keyspace
 

 Key: CASSANDRA-3759
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3759
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dave Brosius
Assignee: Dave Brosius
Priority: Trivial
 Fix For: 1.0.8

 Attachments: CASSANDRA-3759-fix-create-update-drop-keyspaces.patch, 
 CASSANDRA-3759-npe-fix.patch, no_drop_system.diff


 throw an IRE if user attempts to drop system keyspace

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3761) CQL 3.0

2012-01-23 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191187#comment-13191187
 ] 

Pavel Yaskevich commented on CASSANDRA-3761:


Cql.g looks good.

 CQL 3.0
 ---

 Key: CASSANDRA-3761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3761
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Reporter: Sylvain Lebresne
  Labels: cql
 Fix For: 1.1

 Attachments: 0001-CQL-3.0.patch, 
 0002-Add-support-for-switching-the-CQL-version.patch, 
 0003-Makes-batches-atomic.patch, 0004-Thrift-gen-files.patch, cql_tests.py, 
 create_cf_syntaxes.txt


 This ticket is a reformulation/generalization of CASSANDRA-2474. The core 
 change of CQL 3.0 is to introduce the new syntaxes that were discussed in 
 CASSANDRA-2474 that allow to:
 # Provide a better/more native support for wide rows, using the idea of 
 transposed vie.
 # The generalization to composite columns.
 The attached text file create_cf_syntaxes.txt recall the new syntaxes 
 introduced.
 The changes proposed above allow (and strongly suggest in some cases) a 
 number of other changes to the language that this ticket proposes to 
 explore/implement (more details coming in the comments).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-01-22 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190740#comment-13190740
 ] 

Pavel Yaskevich commented on CASSANDRA-2392:


here is the last things with v3

 - {load, save}Summaries methods are leaking file descriptors because {o, 
i}Stream is closed only when method handles IOException. 

 Nit: 

{code}
+FileInputStream input = new FileInputStream(inMemoryDataFile);
+iStream = new DataInputStream(input);
{code}
and
{code}
+FileOutputStream input = new FileOutputStream(summaryFile);
+oStream = new DataOutputStream(input);
{code}

can be changed to 
{noformat}
{i,o}Stream = new Data{Input, Output}Stream(new File{Input, 
Output}Stream(summaryFile); 
{noformat}
because input var is not really needed.

I don't think that 0001-re-factor-first-and-last is a good idea because by 
moving first/last variables to IndexSummary you change their semantics and they 
are no longer indicate the first and last key that SSTable keeps but rather 
first/last key covered by IndexSummary of the individual SSTable, so I think we 
really should just keep those variables in the old place.

Also I'm concerned that CASSANDRA-3762 is marked for 1.2 and this one for 1.1 
because if we don't get them in one release that could make start-up times even 
longer than right now, which breaks the point of current task, because there is 
big chance that key cache would be enabled on the big ColumnFamilies.

 Saving IndexSummaries to disk
 -

 Key: CASSANDRA-2392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-re-factor-first-and-last.patch, 
 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk-v2.patch, 
 0002-save-summaries-to-disk-v3.patch, 0002-save-summaries-to-disk.patch


 For nodes with millions of keys, doing rolling restarts that take over 10 
 minutes per node can be painful if you have 100 node cluster. All of our time 
 is spent on doing index summary computations on startup. It would be great if 
 we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-01-22 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190751#comment-13190751
]

Pavel Yaskevich commented on CASSANDRA-2392:

bq. But the main idea is to reduce the code and the checks which we have to do
just to populate the first and last variable. IMO it is better served in Index
Summary which already has the needed checks. by using maybeAddEntry() and
marking other private everywhere we dont need extra checks else where to
populate the fields... first and last in a index is also a summary :)

Correct me if I'm wrong but as I see in SSTableReader.load(...) that condition
SSTable.last == IndexSummary.last is not a guaranteed thing which means that
IndexSummary.last has a different semantics from SSTable.last. According to
checks - I don't see many of those and IndexSummary in it's current state does
not have anything to do with SSTable's last/first variables so I don't really
understand what checks are you talking about? If you really want to be pedantic
about the domain of first/last - I agree that they could belong to the summary
of the SSTable but certainly not to the index one :)

bq. Because we read from the disk to populate the Index Summary? If yes i can
make sure that both the patches go into the same release.

Because we would end-up reading more data (e.g. some of the keys and all index
and data positions would be read twice) from different files - primary_index
and summary.

Saving IndexSummaries to disk
-

Key: CASSANDRA-2392
URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
Project: Cassandra
Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
Fix For: 1.1

Attachments: 0001-re-factor-first-and-last.patch,
0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk-v2.patch,
0002-save-summaries-to-disk-v3.patch, 0002-save-summaries-to-disk.patch

For nodes with millions of keys, doing rolling restarts that take over 10
minutes per node can be painful if you have 100 node cluster. All of our time
is spent on doing index summary computations on startup. It would be great if
we could save those to disk as well. Our indexes are quite large.

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-01-22 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190776#comment-13190776
 ] 

Pavel Yaskevich commented on CASSANDRA-2392:


Also changes shown below are odd because the same behavior you get by keeping 
code the same - it will just throw an exception somewhere in try block, run 
code in finally block and never get to the indexSummary.complete() and 
{i,d}builder.complete(String) methods which are no-op in that case. Btw, 
indexSummary.complete() can be moved out from the try block because it doesn't 
throw IOException and no-op if code above it does but that is not a big deal 
anyway.


{noformat}
+catch (IOException ex)
+{
+exception = true;
+throw ex;
 }
 finally
 {
+// close the file first.
 FileUtils.closeQuietly(input);
+if (!exception)
+{
+// finalize the load.
+indexSummary.complete();
+// finalize the state of the reader
+ifile = 
ibuilder.complete(descriptor.filenameFor(Component.PRIMARY_INDEX));
+dfile = 
dbuilder.complete(descriptor.filenameFor(Component.DATA));
+}
 }
{noformat}

 Saving IndexSummaries to disk
 -

 Key: CASSANDRA-2392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-re-factor-first-and-last.patch, 
 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk-v2.patch, 
 0002-save-summaries-to-disk-v3.patch, 0002-save-summaries-to-disk.patch


 For nodes with millions of keys, doing rolling restarts that take over 10 
 minutes per node can be painful if you have 100 node cluster. All of our time 
 is spent on doing index summary computations on startup. It would be great if 
 we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-01-22 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190814#comment-13190814
 ] 

Pavel Yaskevich commented on CASSANDRA-2392:


Thanks for the patch, going take a look soon! Will you be able to deal with 
CASSANDRA-3762 in time for 1.1 release? That way we will be able to move it and 
this one back to the 1.1.

 Saving IndexSummaries to disk
 -

 Key: CASSANDRA-2392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-re-factor-first-and-last.patch, 
 0001-save-summaries-to-disk-v4.patch, 0001-save-summaries-to-disk.patch, 
 0002-save-summaries-to-disk-v2.patch, 0002-save-summaries-to-disk-v3.patch, 
 0002-save-summaries-to-disk.patch


 For nodes with millions of keys, doing rolling restarts that take over 10 
 minutes per node can be painful if you have 100 node cluster. All of our time 
 is spent on doing index summary computations on startup. It would be great if 
 we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations

2012-01-21 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190403#comment-13190403
 ] 

Pavel Yaskevich commented on CASSANDRA-1391:


Jonathan: Sure, I will do that, although would it be better to name 
ColumnFamilies using camel-case like SchemaKeyspaces, SchemaColumnFamilies, 
SchemaColumns instead?

Brandon: can you please describe the situation when that happend, have you 
deleted all of the columns in update? It seems like I just forgot to add if 
(columnDefs == null) return empty map; case in 
ColumnDefition.toMap(ListColumnDef) method.

 Allow Concurrent Schema Migrations
 --

 Key: CASSANDRA-1391
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Pavel Yaskevich
Priority: Critical
 Fix For: 1.1

 Attachments: 0001-CASSANDRA-1391-main.patch, 
 0002-CASSANDRA-1391-fixes.patch, 1391-rebased.txt, CASSANDRA-1391.patch


 CASSANDRA-1292 fixed multiple migrations started from the same node to 
 properly queue themselves, but it is still possible for migrations initiated 
 on different nodes to conflict and leave the cluster in a bad state. Since 
 the system_add/drop/rename methods are accessible directly from the client 
 API, they should be completely safe for concurrent use.
 It should be possible to allow for most types of concurrent migrations by 
 converting the UUID schema ID into a VersionVectorClock (as provided by 
 CASSANDRA-580).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations

2012-01-21 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190427#comment-13190427
 ] 

Pavel Yaskevich commented on CASSANDRA-1391:


Ok, gotcha :) I will add null check to ColumnDefinition, that I mentioned 
previously, and re-test everything once again when done with changes requested 
by Jonathan.

 Allow Concurrent Schema Migrations
 --

 Key: CASSANDRA-1391
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Pavel Yaskevich
Priority: Critical
 Fix For: 1.1

 Attachments: 0001-CASSANDRA-1391-main.patch, 
 0002-CASSANDRA-1391-fixes.patch, 1391-rebased.txt, CASSANDRA-1391.patch


 CASSANDRA-1292 fixed multiple migrations started from the same node to 
 properly queue themselves, but it is still possible for migrations initiated 
 on different nodes to conflict and leave the cluster in a bad state. Since 
 the system_add/drop/rename methods are accessible directly from the client 
 API, they should be completely safe for concurrent use.
 It should be possible to allow for most types of concurrent migrations by 
 converting the UUID schema ID into a VersionVectorClock (as provided by 
 CASSANDRA-580).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations

2012-01-21 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190501#comment-13190501
 ] 

Pavel Yaskevich commented on CASSANDRA-1391:


I'm fine dropping schema_ prefix and going with keyspaces, columnfamilies 
but how do we name columns cf then, something like columnfamily_columns?

 Allow Concurrent Schema Migrations
 --

 Key: CASSANDRA-1391
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Pavel Yaskevich
Priority: Critical
 Fix For: 1.1

 Attachments: 0001-CASSANDRA-1391-main.patch, 
 0002-CASSANDRA-1391-fixes.patch, 1391-rebased.txt, CASSANDRA-1391.patch


 CASSANDRA-1292 fixed multiple migrations started from the same node to 
 properly queue themselves, but it is still possible for migrations initiated 
 on different nodes to conflict and leave the cluster in a bad state. Since 
 the system_add/drop/rename methods are accessible directly from the client 
 API, they should be completely safe for concurrent use.
 It should be possible to allow for most types of concurrent migrations by 
 converting the UUID schema ID into a VersionVectorClock (as provided by 
 CASSANDRA-580).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations

2012-01-21 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190534#comment-13190534
 ] 

Pavel Yaskevich commented on CASSANDRA-1391:


Works for me, so be it schema_keyspaces, schema_columnfamilies and 
schema_columns.

 Allow Concurrent Schema Migrations
 --

 Key: CASSANDRA-1391
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Pavel Yaskevich
Priority: Critical
 Fix For: 1.1

 Attachments: 0001-CASSANDRA-1391-main.patch, 
 0002-CASSANDRA-1391-fixes.patch, 1391-rebased.txt, CASSANDRA-1391.patch


 CASSANDRA-1292 fixed multiple migrations started from the same node to 
 properly queue themselves, but it is still possible for migrations initiated 
 on different nodes to conflict and leave the cluster in a bad state. Since 
 the system_add/drop/rename methods are accessible directly from the client 
 API, they should be completely safe for concurrent use.
 It should be possible to allow for most types of concurrent migrations by 
 converting the UUID schema ID into a VersionVectorClock (as provided by 
 CASSANDRA-580).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-01-20 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189802#comment-13189802
]

Pavel Yaskevich commented on CASSANDRA-2392:

bq. Renamed and done recommended changes. Exempt we have all the in-memory
data-structures in one file instead of multiple files. They are handled
differently and will be kind of throw away data so we can regenerate it.

I kind of liked it more when component was Summary because InMemoryData doesn't
really tell what is inside. Please rename SegmentedFile serialize/deserialize
to something like serializeBounds/deserializeBounds.

bq. I do see Keycache working in my tests...

Sorry I wasn't clear when I was saying that. It seems like that summary
save/load is pointless in it's current form because even if we have loaded
summary from disk we would anyway have to loop through *whole* PRIMARY_INDEX if
pre-cache (which is always enabled by default) or re-create-BloomFilter was
enabled, which is practically means that we spend the same time on I/O there as
ibuilder.deserialize and dbuilder.deserialize together. We would need to change
the logic in SSTableReader.load(boolean, SetDecoratedKey) the way it doesn't
have such I/O overhead because this will make it even slower comparing to the
time it takes now.

Saving IndexSummaries to disk
-

Key: CASSANDRA-2392
URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
Project: Cassandra
Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
Fix For: 1.1

Attachments: 0001-re-factor-first-and-last.patch,
0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk-v2.patch,
0002-save-summaries-to-disk.patch

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-01-19 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189046#comment-13189046
 ] 

Pavel Yaskevich commented on CASSANDRA-2392:


Thanks for the patch! Here is my review:

- Index summaries load in SSTableReader.load(boolean, SetDecoratedKey) breaks 
key cache pre-load.

- IndexSummary deserialize(...) method should be made static and return 
IndexSummary object. This will also allow to drop IndexSummary argument from 
SSTableReader.loadSummaries(...).

- To avoid any seeks in the PRIMARY_INDEX file upon IndexSummary.deserialize I 
suggest to save key (only BB part) as well as index position on 
IndexSummary.serialize.

- I would also suggest to save dataPosition from the primary index into 
summaries file to avoid adding serialization to SegmentedFile because 
SegmentedFile serialize(...)/deserialize(...) are not really a 
serialize/deserialize - they just save/read boundaries. This way you would be 
able to do deserialization and boundary load at the save time without 
saving/reading additional information to/from the disk because only ibuilder 
needs indexPosition and dbuilder - dataPosition.

- loadSummaries should be renamed to something more appropriate because that 
method does not only load index summaries it also loads index and data 
builders, per se it does not really load them but rather just deserializes 
boundaries into an existing object with is not a good practice.

- can you please explain this chunk of code to me?
{code}
+// don't rename summaries as it is not created yet and created 
while it is loaded.
+for (Component component : Sets.difference(components, 
Sets.newHashSet(Component.DATA, Component.SUMMARIES)))
  FBUtilities.renameWithConfirm(tmpdesc.filenameFor(component), 
newdesc.filenameFor(component));
{code}



 Saving IndexSummaries to disk
 -

 Key: CASSANDRA-2392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-re-factor-first-and-last.patch, 
 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk.patch


 For nodes with millions of keys, doing rolling restarts that take over 10 
 minutes per node can be painful if you have 100 node cluster. All of our time 
 is spent on doing index summary computations on startup. It would be great if 
 we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-01-19 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189299#comment-13189299
 ] 

Pavel Yaskevich commented on CASSANDRA-2392:


bq. I am not sure how saving dataPosition will help as we only have summaries 
between 128Keys or more and how will we mark a boundary with it? For example 
each row is 100MB big.

Oh yes, you are right, we really need all boundary information from segmented 
files, my bad.


 Saving IndexSummaries to disk
 -

 Key: CASSANDRA-2392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-re-factor-first-and-last.patch, 
 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk.patch


 For nodes with millions of keys, doing rolling restarts that take over 10 
 minutes per node can be painful if you have 100 node cluster. All of our time 
 is spent on doing index summary computations on startup. It would be great if 
 we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations

2012-01-19 Thread Pavel Yaskevich (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189511#comment-13189511
]

Pavel Yaskevich commented on CASSANDRA-1391:

bq. validateSchemaAgreement is unnecessary now right?

I think it's still a good idea to validate if all nodes have the same schema.

bq. the old Migration infrastructure feels unnecessarily heavyweight now. Can
we move the validation into the CassandraServer methods, and then just invoke a
MigrationHelper method from a runnable there?

I tried to optimize it as much as possible because I still think that there is
a reason to keep it because it encapsulates all announce, apply and validation
logic pretty good. I tried to move validation and stuff to the CassandraServer
but it shows itself as hardly readable and heavy-weight.

bq. should we snapshot the old avro schema before nuking it?

MigrationHelper.dropColumnFamily that I call to remove Migrations and Schema
CFs makes snapshot of the data.

bq. SystemTable.dropOldSchemaTables is a no-op. I think we can take this out
entirely since loadSchema/fromAvro takes care of it?

Ugh, I forgot to remove it from the final version of the patch, sorry...

bq. Can you add a comment describing the layout of the new schema CFs to
defstable or systemtable?

Sure, I will do that in SystemTable.

bq. I'd prefer to leave the low level slicing / deserialize in SystemTable
class instead of scattered between Schema and DefsTable

Sure, I will move serialize and serialized methods from Schema to SystemTable,
plus DefsTable.readSchemaRow and getSchema also go there.

Allow Concurrent Schema Migrations
--

Key: CASSANDRA-1391
URL: https://issues.apache.org/jira/browse/CASSANDRA-1391
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Pavel Yaskevich
Priority: Critical
Fix For: 1.1

Attachments: 1391-rebased.txt, CASSANDRA-1391.patch

CASSANDRA-1292 fixed multiple migrations started from the same node to
properly queue themselves, but it is still possible for migrations initiated
on different nodes to conflict and leave the cluster in a bad state. Since
the system_add/drop/rename methods are accessible directly from the client
API, they should be completely safe for concurrent use.
It should be possible to allow for most types of concurrent migrations by
converting the UUID schema ID into a VersionVectorClock (as provided by
CASSANDRA-580).

1 2 3 >

1 - 100 of 244 matches

Mail list logo