[jira] [Updated] (CASSANDRA-3734) Support native link w/o JNA in Java7

2012-02-04 Thread Peter Schuller (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller updated CASSANDRA-3734:
--

Attachment: CASSANDRA-3734-trunk-v1.txt

Attached patch. Creates a {{NativeFileSystem}} interface, with a {{Java6}} and 
{{Java7}} implementation. FileUtils.createHardLink() created, and the decision 
on which backend to use happens upon static initialization of FileUtils based 
on whether nio2 classes seem to be available.

Broke tradition and named arguments "existing" and "link" instead of "source" 
and "dest" to make it less possible to confuse the "direction" of linking.

Added a temporary directory creation utility to FileUtil while at it (instead 
of using the racy create-file-delete-mkdir approach).

Unit tests always test the java6 version, and tests the java7 version if on 
java7.


> Support native link w/o JNA in Java7
> 
>
> Key: CASSANDRA-3734
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3734
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Peter Schuller
>Priority: Minor
> Fix For: 1.2
>
> Attachments: CASSANDRA-3734-trunk-v1.txt
>
>
> Java7 provides native support for hard links: 
> http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#createLink(java.nio.file.Path,
>  java.nio.file.Path)
> We should prefer this method when Java7 is the host.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

2012-02-04 Thread Vijay (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay updated CASSANDRA-3838:
-

Attachment: 0001-CASSANDRA-3838.patch

> Repair Streaming hangs between multiple regions
> ---
>
> Key: CASSANDRA-3838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.7
>Reporter: Vijay
>Assignee: Vijay
>Priority: Minor
> Fix For: 1.0.8
>
> Attachments: 0001-Add-streaming-socket-timeouts.patch, 
> 0001-CASSANDRA-3838.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons 
> for this, a simple fix will be to add the Socket timeout so the session can 
> retry.
> The following is the netstat of the affected node (the below output remains 
> this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db 
> sections=7002 progress=1523325354/2475291786 - 61%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db 
> sections=6266 progress=0/2190197091 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db 
> sections=7662 progress=0/3082087770 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db 
> sections=7874 progress=0/587439833 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x2aaac2060800 nid=0x1676 runnable 
> [0x6be85000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
> at 
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
> - locked <0x0006afea1bd8> (a 
> com.sun.net.ssl.internal.ssl.AppOutputStream)
> at 
> com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
> at 
> com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
> at 
> com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
> at 
> org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
> at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db 
> sections=7231 progress=0/1548922508 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db 
> sections=4730 progress=0/296474156 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db 
> sections=7650 progress=0/1580417610 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db 
> sections=7682 progress=0/196689250 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db 
> sections=7149 progress=0/478695185 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db 
> sections=443 progress=0/78417320 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc--Data.db 
> sections=4590 progress=0/1310718798 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
>abtests: /mnt/data/cassandra070/da

[jira] [Updated] (CASSANDRA-3735) Fix "Unable to create hard link" SSTableReaderTest error messages

2012-02-04 Thread Peter Schuller (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller updated CASSANDRA-3735:
--

Attachment: 0003-reset-file-index-generator-on-reset.patch

> Fix "Unable to create hard link" SSTableReaderTest error messages
> -
>
> Key: CASSANDRA-3735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3735
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
> Attachments: 0001-fix-generation-update-in-loadNewSSTables.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables-v2.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables.patch, 
> 0003-reset-file-index-generator-on-reset.patch
>
>
> Sample failure (on Windows):
> {noformat}
> [junit] java.io.IOException: Exception while executing the command: cmd 
> /c mklink /H 
> C:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\backups\Standard1-hc-1-Index.db
>  
> c:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\Standard1-hc-1-Index.db,command
>  error Code: 1, command output: Cannot create a file when that file already 
> exists.
> [junit]
> [junit] at org.apache.cassandra.utils.CLibrary.exec(CLibrary.java:213)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:188)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:151)
> [junit] at 
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:833)
> [junit] at 
> org.apache.cassandra.db.DataTracker$1.runMayThrow(DataTracker.java:161)
> [junit] at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> [junit] at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> [junit] at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> [junit] at java.lang.Thread.run(Thread.java:662)
> [junit] ERROR 17:10:17,111 Fatal exception in thread 
> Thread[NonPeriodicTasks:1,5,main]
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3735) Fix "Unable to create hard link" SSTableReaderTest error messages

2012-02-04 Thread Peter Schuller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200666#comment-13200666
 ] 

Peter Schuller commented on CASSANDRA-3735:
---

Correction, the conversion to post-2794 did remove the failure of the 
SSTableReaderTest. Or at least it's no longer happening for me on trunk with 
the attached patch (v2). Likely because it's only removing specific sstable 
components given by the iterator, rather than trying to recursively delete 
backups, but I don't pretend to understand exactly what the history of changes 
is that caused it to start failing to delete it to begin with.

With repsect to the 'Largest generation seen...' warning, I get that too, but I 
don't see any subsequent hard link creation failures, nor do I understand why I 
would if the files are created without using the counter and the quick fix just 
suppresses the warning? But I'm probably missing something. I do have failing 
hard linking in ThriftValidationTest though. Maybe this is a side-effect that 
you're referring to?

In any case, attaching the trivial (if I understood the suggestion correctly) 
reset patch that supresses the warning.




> Fix "Unable to create hard link" SSTableReaderTest error messages
> -
>
> Key: CASSANDRA-3735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3735
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
> Attachments: 0001-fix-generation-update-in-loadNewSSTables.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables-v2.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables.patch, 
> 0003-reset-file-index-generator-on-reset.patch
>
>
> Sample failure (on Windows):
> {noformat}
> [junit] java.io.IOException: Exception while executing the command: cmd 
> /c mklink /H 
> C:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\backups\Standard1-hc-1-Index.db
>  
> c:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\Standard1-hc-1-Index.db,command
>  error Code: 1, command output: Cannot create a file when that file already 
> exists.
> [junit]
> [junit] at org.apache.cassandra.utils.CLibrary.exec(CLibrary.java:213)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:188)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:151)
> [junit] at 
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:833)
> [junit] at 
> org.apache.cassandra.db.DataTracker$1.runMayThrow(DataTracker.java:161)
> [junit] at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> [junit] at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> [junit] at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> [junit] at java.lang.Thread.run(Thread.java:662)
> [junit] ERROR 17:10:17,111 Fatal exception in thread 
> Thread[NonPeriodicTasks:1,5,main]
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3735) Fix "Unable to create hard link" SSTableReaderTest error messages

2012-02-04 Thread Peter Schuller (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200642#comment-13200642
 ] 

Peter Schuller edited comment on CASSANDRA-3735 at 2/5/12 2:54 AM:
---

Attaching new version of 0002* that works (but still with the left-overs 
already mentioned by jbellis/sylvain) post CASSANDRA-2794.

  was (Author: scode):
Attaching new version of 0002* that works post CASSANDRA-2794.
  
> Fix "Unable to create hard link" SSTableReaderTest error messages
> -
>
> Key: CASSANDRA-3735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3735
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
> Attachments: 0001-fix-generation-update-in-loadNewSSTables.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables-v2.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables.patch
>
>
> Sample failure (on Windows):
> {noformat}
> [junit] java.io.IOException: Exception while executing the command: cmd 
> /c mklink /H 
> C:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\backups\Standard1-hc-1-Index.db
>  
> c:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\Standard1-hc-1-Index.db,command
>  error Code: 1, command output: Cannot create a file when that file already 
> exists.
> [junit]
> [junit] at org.apache.cassandra.utils.CLibrary.exec(CLibrary.java:213)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:188)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:151)
> [junit] at 
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:833)
> [junit] at 
> org.apache.cassandra.db.DataTracker$1.runMayThrow(DataTracker.java:161)
> [junit] at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> [junit] at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> [junit] at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> [junit] at java.lang.Thread.run(Thread.java:662)
> [junit] ERROR 17:10:17,111 Fatal exception in thread 
> Thread[NonPeriodicTasks:1,5,main]
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3735) Fix "Unable to create hard link" SSTableReaderTest error messages

2012-02-04 Thread Peter Schuller (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200642#comment-13200642
 ] 

Peter Schuller edited comment on CASSANDRA-3735 at 2/5/12 2:53 AM:
---

Attaching new version of 0002* that works post CASSANDRA-2794.

  was (Author: scode):
Attaching new version of 0002* that works post CASSANDRA_2794.
  
> Fix "Unable to create hard link" SSTableReaderTest error messages
> -
>
> Key: CASSANDRA-3735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3735
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
> Attachments: 0001-fix-generation-update-in-loadNewSSTables.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables-v2.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables.patch
>
>
> Sample failure (on Windows):
> {noformat}
> [junit] java.io.IOException: Exception while executing the command: cmd 
> /c mklink /H 
> C:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\backups\Standard1-hc-1-Index.db
>  
> c:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\Standard1-hc-1-Index.db,command
>  error Code: 1, command output: Cannot create a file when that file already 
> exists.
> [junit]
> [junit] at org.apache.cassandra.utils.CLibrary.exec(CLibrary.java:213)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:188)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:151)
> [junit] at 
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:833)
> [junit] at 
> org.apache.cassandra.db.DataTracker$1.runMayThrow(DataTracker.java:161)
> [junit] at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> [junit] at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> [junit] at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> [junit] at java.lang.Thread.run(Thread.java:662)
> [junit] ERROR 17:10:17,111 Fatal exception in thread 
> Thread[NonPeriodicTasks:1,5,main]
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3735) Fix "Unable to create hard link" SSTableReaderTest error messages

2012-02-04 Thread Peter Schuller (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller updated CASSANDRA-3735:
--

Attachment: 
0002-remove-incremental-backups-before-reloading-sstables-v2.patch

Attaching new version of 0002* that works post CASSANDRA_2794.

> Fix "Unable to create hard link" SSTableReaderTest error messages
> -
>
> Key: CASSANDRA-3735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3735
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
> Attachments: 0001-fix-generation-update-in-loadNewSSTables.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables-v2.patch, 
> 0002-remove-incremental-backups-before-reloading-sstables.patch
>
>
> Sample failure (on Windows):
> {noformat}
> [junit] java.io.IOException: Exception while executing the command: cmd 
> /c mklink /H 
> C:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\backups\Standard1-hc-1-Index.db
>  
> c:\Users\Jonathan\projects\cassandra\git\build\test\cassandra\data\Keyspace1\Standard1-hc-1-Index.db,command
>  error Code: 1, command output: Cannot create a file when that file already 
> exists.
> [junit]
> [junit] at org.apache.cassandra.utils.CLibrary.exec(CLibrary.java:213)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:188)
> [junit] at 
> org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:151)
> [junit] at 
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:833)
> [junit] at 
> org.apache.cassandra.db.DataTracker$1.runMayThrow(DataTracker.java:161)
> [junit] at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> [junit] at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> [junit] at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> [junit] at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> [junit] at java.lang.Thread.run(Thread.java:662)
> [junit] ERROR 17:10:17,111 Fatal exception in thread 
> Thread[NonPeriodicTasks:1,5,main]
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3710) Add a configuration option to disable snapshots

2012-02-04 Thread Christian Spriegel (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200629#comment-13200629
 ] 

Christian Spriegel commented on CASSANDRA-3710:
---

Hi Peter,
it is good to know that we are not the only ones having this problem.

The approaches you described are not really suitable for our application. Empty 
or old rows would distract the application. Thats why I created a pretty 
radical patch for cassandra:

It adds a new config setting called 'test_mode_enabled'. If set to true, it 
will disable the commitlog, disable snapshots and disables memtable flushes for 
truncates.

I uploaded it, maybe this is useful for your tests too.

Christian

> Add a configuration option to disable snapshots
> ---
>
> Key: CASSANDRA-3710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3710
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Brandon Williams
> Fix For: 1.0.8
>
> Attachments: Cassandra107Patch_TestModeV1.txt
>
>
> Let me first say, I hate this idea.  It gives cassandra the ability to 
> permanently delete data at a large scale without any means of recovery.  
> However, I've seen this requested multiple times, and it is in fact useful in 
> some scenarios, such as when your application is using an embedded cassandra 
> instance for testing and need to truncate, which without JNA will timeout 
> more often than not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3710) Add a configuration option to disable snapshots

2012-02-04 Thread Christian Spriegel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Spriegel updated CASSANDRA-3710:
--

Attachment: Cassandra107Patch_TestModeV1.txt

added 'testmode' patch

> Add a configuration option to disable snapshots
> ---
>
> Key: CASSANDRA-3710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3710
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Brandon Williams
> Fix For: 1.0.8
>
> Attachments: Cassandra107Patch_TestModeV1.txt
>
>
> Let me first say, I hate this idea.  It gives cassandra the ability to 
> permanently delete data at a large scale without any means of recovery.  
> However, I've seen this requested multiple times, and it is in fact useful in 
> some scenarios, such as when your application is using an embedded cassandra 
> instance for testing and need to truncate, which without JNA will timeout 
> more often than not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

2012-02-04 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200619#comment-13200619
 ] 

Vijay commented on CASSANDRA-3838:
--

 In either case, definitely don't use rpc timeout IMO; the concerns are 
 completely different. A low-timeout cluster with an rpc timeout of 0.5 
 seconds 
We will add a configuration  streaming_socket_timeout  which will be different 
than rpc_timeout...  

>>> If this (socket timeouts) does go in, I argue even more strongly than 
>>> before that the tear-down of streams due to failure detector as in 
>>> CASSANDRA-3569
I dont have any option on that ticket, but it looks reasonable. I would say 
so_timeout will be a better solution for streaming as it is not a long lived 
connections... but i also think Keep alive should be set for the Messaging 
connection as you mentioned in the other ticket.

>>> I do believe though that if you don't care about having to wait for a few 
>>> hours for streams to abort
We definitely dont want to wait for hours And i dont think we have to wait 
for hours when we have a better option, even if we set streaming_socket_timeout 
to 30 Seconds or even a minute.

>>> As for reads vs. writes: You definitely want timeouts on both sides in 
>>> order to guarantee that you never hang under any circumstance 
Agree, i will get the patch done in few min.

> Repair Streaming hangs between multiple regions
> ---
>
> Key: CASSANDRA-3838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.7
>Reporter: Vijay
>Assignee: Vijay
>Priority: Minor
> Fix For: 1.0.8
>
> Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons 
> for this, a simple fix will be to add the Socket timeout so the session can 
> retry.
> The following is the netstat of the affected node (the below output remains 
> this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db 
> sections=7002 progress=1523325354/2475291786 - 61%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db 
> sections=6266 progress=0/2190197091 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db 
> sections=7662 progress=0/3082087770 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db 
> sections=7874 progress=0/587439833 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x2aaac2060800 nid=0x1676 runnable 
> [0x6be85000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
> at 
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
> - locked <0x0006afea1bd8> (a 
> com.sun.net.ssl.internal.ssl.AppOutputStream)
> at 
> com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
> at 
> com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
> at 
> com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
> at 
> org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
> at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>abtests: /mnt/data/cassandra070/data/a

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

2012-02-04 Thread Peter Schuller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200617#comment-13200617
 ] 

Peter Schuller commented on CASSANDRA-3838:
---

Let me be more clear about why keep-alive is better.

TCP keep-alive is at the transport level, and thus independent of in-band data 
(or lack thereof). Imagine that you're implementing a remote procedure call 
protocol where the client sends:

{code}
INVOKE name-of-process arg1 arg2
{code}

The server invokes the method, and responds:

{code}
RET success|failure exit-value|exception
{code}

The first thing you need if you are using this in some kind of production 
scenario, is to ensure that requests can time out. But there is a problem. 
Suppose you're making the assumption that this software is running on 
well-connected networks and a high number of requests per second; there is no 
reason to not quickly time out requests if the remote host is unreachable. So 
you set a socket timeout to 1 second. The only problem is that it will also 
time out on all requests that take longer than 1 second because the method call 
legitimately took longer.

The conflict happens because the selection of timeout was made based on the 
transport level circumstances (fast local network, high throughput, no need to 
wait if a host is down) while the effect of the timeout is at the in-band data 
level and is thus triggered by a slow request.

One way to fix this is to extend the protocol between client and server such 
that they can constantly be exchanging PING/PONG type messages (witness IRC for 
an example of this). This allows you to utilize socket (or read/write op) 
timeouts to detect a broken transport, under the assumption/premise that both 
sides have dedicated code for the ping/pong stuff which is independent of any 
delay in processing the otherwise in-band data.

Disadvantages of this approach can include the need to actually change the 
protocol, and (depending on implementation) additional implementation 
complexity as you suddenly need to actively model the transport as such.

TCP keep-alive is a way to let the kernel/tcp, which is already supposed to 
support this, deal with this without adding complexity to the application. It 
allows what effectively boils down to a "timeout" at the transport level which 
can be selected based on use-case and expected networking characteristics, and 
is independent of the nature of the in-band data sent over that transport.

In the Cassandra case, the equivalent of the slow RPC call might be that a 
write() during streaming blocks for 5 seconds because socket buffers on both 
ends are full, and the other end is going a GC or waiting on an fsync().

By using keep-alives we get more "correct" behavior in that such blocks won't 
cause connection tear-downs, while at the same time not having to change the 
protocol and/or add complexity to the code base to implement a 
protocol-within-tcp in which to mux the actual payload for streaming.


> Repair Streaming hangs between multiple regions
> ---
>
> Key: CASSANDRA-3838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.7
>Reporter: Vijay
>Assignee: Vijay
>Priority: Minor
> Fix For: 1.0.8
>
> Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons 
> for this, a simple fix will be to add the Socket timeout so the session can 
> retry.
> The following is the netstat of the affected node (the below output remains 
> this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db 
> sections=7002 progress=1523325354/2475291786 - 61%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db 
> sections=6266 progress=0/2190197091 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db 
> sections=7662 progress=0/3082087770 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db 
> sections=7874 progress=0/587439833 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x2aaac2060800 nid=0x1676 runnable 
> [0x6be85000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.soc

[jira] [Commented] (CASSANDRA-3569) Failure detector downs should not break streams

2012-02-04 Thread Peter Schuller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200618#comment-13200618
 ] 

Peter Schuller commented on CASSANDRA-3569:
---

Better description of why I believe TCP keep-alive to be the "correct" choice 
(unless we change the protocol): 
https://issues.apache.org/jira/browse/CASSANDRA-3838?focusedCommentId=13200617&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13200617


> Failure detector downs should not break streams
> ---
>
> Key: CASSANDRA-3569
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3569
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Peter Schuller
>Assignee: Peter Schuller
>
> CASSANDRA-2433 introduced this behavior just to get repairs to don't sit 
> there waiting forever. In my opinion the correct fix to that problem is to 
> use TCP keep alive. Unfortunately the TCP keep alive period is insanely high 
> by default on a modern Linux, so just doing that is not entirely good either.
> But using the failure detector seems non-sensicle to me. We have a 
> communication method which is the TCP transport, that we know is used for 
> long-running processes that you don't want to incorrectly be killed for no 
> good reason, and we are using a failure detector tuned to detecting when not 
> to send real-time sensitive request to nodes in order to actively kill a 
> working connection.
> So, rather than add complexity with protocol based ping/pongs and such, I 
> propose that we simply just use TCP keep alive for streaming connections and 
> instruct operators of production clusters to tweak 
> net.ipv4.tcp_keepalive_{probes,intvl} as appropriate (or whatever equivalent 
> on their OS).
> I can submit the patch. Awaiting opinions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3569) Failure detector downs should not break streams

2012-02-04 Thread Peter Schuller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200613#comment-13200613
 ] 

Peter Schuller commented on CASSANDRA-3569:
---

There is some sanity! :) It turns out that on Linux specifically you can set 
per-socket keep-alive socket options. From tcp(7):

{code}
   TCP_KEEPCNT (since Linux 2.4)
  The maximum number of keepalive probes TCP should send before 
dropping the connection.  This option should not be used in code intended to be 
portable.

   TCP_KEEPIDLE (since Linux 2.4)
  The  time  (in  seconds)  the connection needs to remain idle 
before TCP starts sending keepalive probes, if the socket option SO_KEEPALIVE 
has been set on this socket.  This option
  should not be used in code intended to be portable.

   TCP_KEEPINTVL (since Linux 2.4)
  The time (in seconds) between individual keepalive probes.  This 
option should not be used in code intended to be portable.
{code}

This suddenly makes it insanely more usable to us, with the caveat of 
portability.


> Failure detector downs should not break streams
> ---
>
> Key: CASSANDRA-3569
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3569
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Peter Schuller
>Assignee: Peter Schuller
>
> CASSANDRA-2433 introduced this behavior just to get repairs to don't sit 
> there waiting forever. In my opinion the correct fix to that problem is to 
> use TCP keep alive. Unfortunately the TCP keep alive period is insanely high 
> by default on a modern Linux, so just doing that is not entirely good either.
> But using the failure detector seems non-sensicle to me. We have a 
> communication method which is the TCP transport, that we know is used for 
> long-running processes that you don't want to incorrectly be killed for no 
> good reason, and we are using a failure detector tuned to detecting when not 
> to send real-time sensitive request to nodes in order to actively kill a 
> working connection.
> So, rather than add complexity with protocol based ping/pongs and such, I 
> propose that we simply just use TCP keep alive for streaming connections and 
> instruct operators of production clusters to tweak 
> net.ipv4.tcp_keepalive_{probes,intvl} as appropriate (or whatever equivalent 
> on their OS).
> I can submit the patch. Awaiting opinions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

2012-02-04 Thread Peter Schuller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200612#comment-13200612
 ] 

Peter Schuller commented on CASSANDRA-3838:
---

Vijay, I do believe though that if you don't care about having to wait for a 
few hours for streams to abort, simply setting keep alive is the easiest and 
least-likely-to-have-negative-side-effects fix to your problem of inter-dc 
streams.


> Repair Streaming hangs between multiple regions
> ---
>
> Key: CASSANDRA-3838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.7
>Reporter: Vijay
>Assignee: Vijay
>Priority: Minor
> Fix For: 1.0.8
>
> Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons 
> for this, a simple fix will be to add the Socket timeout so the session can 
> retry.
> The following is the netstat of the affected node (the below output remains 
> this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db 
> sections=7002 progress=1523325354/2475291786 - 61%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db 
> sections=6266 progress=0/2190197091 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db 
> sections=7662 progress=0/3082087770 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db 
> sections=7874 progress=0/587439833 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x2aaac2060800 nid=0x1676 runnable 
> [0x6be85000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
> at 
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
> - locked <0x0006afea1bd8> (a 
> com.sun.net.ssl.internal.ssl.AppOutputStream)
> at 
> com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
> at 
> com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
> at 
> com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
> at 
> org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
> at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db 
> sections=7231 progress=0/1548922508 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db 
> sections=4730 progress=0/296474156 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db 
> sections=7650 progress=0/1580417610 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db 
> sections=7682 progress=0/196689250 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db 
> sections=7149 progress=0/478695185 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db 
> sections=443 progress=0/78417320 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc--Data.db 
> sections=4590 progress=0/1310718798 - 0%
>abtests: /mnt/data/

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

2012-02-04 Thread Peter Schuller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200610#comment-13200610
 ] 

Peter Schuller commented on CASSANDRA-3838:
---

Note that simply adding a socket timeout is not a good idea unless both sides 
are truly expected to never starve (this is why I didn't suggest it for 
CASSANDRA-3569, and why TCP keep-alive is the "correct" solution because it 
does not generate spurious timeouts by lack of in-band data on the channel - 
but as noted in that ticket, the practical reality is that we don't control 
keep alive parameters on a per-socket basis).

For example if one of the ends is waiting for a few seconds for a particularly 
expensive fsync(), or waiting for some kind of lock, you'd have spurious 
failures (whereas this is not the case for keep-alive, because the transport is 
alive and kicking at the kernel level). Depending on surrounding logic, it 
could be dangerous if it causes the receiver to believe it received the file 
while the sender believes it doesn't (e.g. multiple streaming -> disk space 
explosion).

I would suggest TCP keep-alive for the reasons mentioned here and discussed in 
CASSANDRA-3569, and suggest that the TCP keep-alive settings be tweaked to fail 
quicker if that is desired.

If adding a socket timeout, thought needs to go into what kind of false failure 
cases will be created. If both ends are truly expected not to block on anything 
like compaction locks or whatever else there might be, it might be okay.

In either case, definitely *don't* use rpc timeout IMO; the concerns are 
completely different. A low-timeout cluster with an rpc timeout of 0.5 seconds 
for example would be extremely sensitive to even the slightest hiccup (such as 
waitnig 1 second for an fsync(), or a GC pause, etc) and it would truly be 
useless and extremely damaging to kill streams for that.

In general, as with CASSANDRA-3569, I strongly argue that streaming should not 
be caused to spuriously fail because the impact of that can be huge, 
particularly on clusters with large nodes.

As for reads vs. writes: You definitely want timeouts on both sides in order to 
guarantee that you never hang under any circumstance regardless of the nature 
of the TCP connection loss, unless you have some other method to accomplish the 
same thing.

If this (socket timeouts) does go in, I argue even more strongly than before 
that the tear-down of streams due to failure detector as in CASSANDRA-3569 is 
truly just negative rather than positive (but as noted in that ticket, not 
hanging forever on repairs and such remains a concern).


> Repair Streaming hangs between multiple regions
> ---
>
> Key: CASSANDRA-3838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.7
>Reporter: Vijay
>Assignee: Vijay
>Priority: Minor
> Fix For: 1.0.8
>
> Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons 
> for this, a simple fix will be to add the Socket timeout so the session can 
> retry.
> The following is the netstat of the affected node (the below output remains 
> this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db 
> sections=7002 progress=1523325354/2475291786 - 61%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db 
> sections=6266 progress=0/2190197091 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db 
> sections=7662 progress=0/3082087770 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db 
> sections=7874 progress=0/587439833 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x2aaac2060800 nid=0x1676 runnable 
> [0x6be85000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
> at 
>

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

2012-02-04 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200566#comment-13200566
 ] 

Vijay commented on CASSANDRA-3838:
--

Hi Sylvain,
My observation on this is that... when there is network congestion the Routers 
will start to drop the packets and which will cause the write on the socket to 
hang Until we write again to the socket we will not know if the socket is 
closed or not... hence it will be better to have it in both the sides... 

I will add streaming_socket_timeout and add documentation in the next patch... 
if you are ok with the above Thanks!

> Repair Streaming hangs between multiple regions
> ---
>
> Key: CASSANDRA-3838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.7
>Reporter: Vijay
>Assignee: Vijay
>Priority: Minor
> Fix For: 1.0.8
>
> Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons 
> for this, a simple fix will be to add the Socket timeout so the session can 
> retry.
> The following is the netstat of the affected node (the below output remains 
> this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db 
> sections=7002 progress=1523325354/2475291786 - 61%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db 
> sections=6266 progress=0/2190197091 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db 
> sections=7662 progress=0/3082087770 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db 
> sections=7874 progress=0/587439833 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x2aaac2060800 nid=0x1676 runnable 
> [0x6be85000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
> at 
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
> - locked <0x0006afea1bd8> (a 
> com.sun.net.ssl.internal.ssl.AppOutputStream)
> at 
> com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
> at 
> com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
> at 
> com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
> at 
> org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
> at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db 
> sections=7231 progress=0/1548922508 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db 
> sections=4730 progress=0/296474156 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db 
> sections=7650 progress=0/1580417610 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db 
> sections=7682 progress=0/196689250 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db 
> sections=7149 progress=0/478695185 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db 
> sections=443 progress=0/78417320 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sect

[jira] [Commented] (CASSANDRA-3831) scaling to large clusters in GossipStage impossible due to calculatePendingRanges

2012-02-04 Thread Peter Schuller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200524#comment-13200524
 ] 

Peter Schuller commented on CASSANDRA-3831:
---

I agree (I did say that myself already ;)). The memoization (+ being ready to 
change the cluster-wide phi convict threshold through JMX) was just the safest 
way to fix the situation on our production cluster so that we could continue to 
add capacity. It was never intended as a suggested fix. But I still wanted to 
upload it instead of keeping the patch private, in case someone's helped by it.

But the larger issue is that calculatePendingRanges must be faster to begin 
with. Even if only called once, if it takes 1-4 seconds on a ~ 180 node cluster 
and it's worse than {{O(n^3)}} it's *way* too slow and won't scale. First due 
to the failure detector, and of course at some point it's just too slow to even 
wait for the calculation to complete at all (from a RING_DELAY standpoint for 
example).

I'll see later this weekend about doing more tests on trunk confirm/deny 
whether it is getting called multiple times. As I indicated I never confirmed 
that particular bit on trunk and it's very possible it doesn't happen there.

I haven't had time to seriously look at suggesting changes to fix the 
computational complexity. Might be very easy for all I know; I just haven't 
looked at it yet.


> scaling to large clusters in GossipStage impossible due to 
> calculatePendingRanges 
> --
>
> Key: CASSANDRA-3831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3831
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Peter Schuller
>Assignee: Peter Schuller
>Priority: Critical
> Attachments: CASSANDRA-3831-memoization-not-for-inclusion.txt
>
>
> (most observations below are from 0.8, but I just now tested on
> trunk and I can trigger this problem *just* by bootstrapping a ~180
> nod cluster concurrently, presumably due to the number of nodes that
> are simultaneously in bootstrap state)
> It turns out that:
> * (1) calculatePendingRanges is not just expensive, it's computationally 
> complex - cubic or worse
> * (2) it gets called *NOT* just once per node being bootstrapped/leaving etc, 
> but is called repeatedly *while* nodes are in these states
> As a result, clusters start exploding when you start reading 100-300
> nodes. The GossipStage will get backed up because a single
> calculdatePenginRanges takes seconds, and depending on what the
> average heartbeat interval is in relation to this, this can lead to
> *massive* cluster-wide flapping.
> This all started because we hit this in production; several nodes
> would start flapping several other nodes as down, with many nodes
> seeing the entire cluster, or a large portion of it, as down. Logging
> in to some of these nodes you would see that they would be constantly
> flapping up/down for minutes at a time until one became lucky and it
> stabilized.
> In the end we had to perform an emergency full-cluster restart with
> gossip patched to force-forget certain nodes in bootstrapping state.
> I can't go into all details here from the post-mortem (just the
> write-up would take a day), but in short:
> * We graphed the number of hosts in the cluster that had more than 5
>   Down (in a cluster that should have 0 down) on a minutely timeline.
> * We also graphed the number of hosts in the cluster that had GossipStage 
> backed up.
> * The two graphs correlated *extremely* well
> * jstack sampling showed it being CPU bound doing mostly sorting under 
> calculatePendingRanges
> * We were never able to exactly reproduce it with normal RING_DELAY and 
> gossip intervals, even on a 184 node cluster (the production cluster is 
> around 180).
> * Dropping RING_DELAY and in particular dropping gossip interval to 10 ms 
> instead of 1000 ms, we were able to observe all of the behavior we saw in 
> production.
> So our steps to reproduce are:
> * Launch 184 node cluster w/ gossip interval at 10ms and RING_DELAY at 1 
> second.
> * Do something like: {{while [ 1 ] ; do date ; echo decom ; nodetool 
> decommission ; date ; echo done leaving decommed for a while ; sleep 3 ; date 
> ; echo done restarting; sudo rm -rf /data/disk1/commitlog/* ; sudo rm -rf 
> /data/diskarray/tables/* ; sudo monit restart cassandra ;date ; echo 
> restarted waiting for a while ; sleep 40; done}} (or just do a manual 
> decom/bootstrap once, it triggers every time)
> * Watch all nodes flap massively and not recover at all, or maybe after a 
> *long* time.
> I observed the flapping using a python script that every 5 second
> (randomly spread out) asked for unreachable nodes from *all* nodes in
> the cluster, and printed a

[jira] [Issue Comment Edited] (CASSANDRA-3851) Wrong Keyspace name is generated while streaming the sstables using BulkOutputFormat.

2012-02-04 Thread Brandon Williams (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200517#comment-13200517
 ] 

Brandon Williams edited comment on CASSANDRA-3851 at 2/4/12 7:46 PM:
-

Right, when CASSANDRA-3740 is resolved, it will get merged into 1.1 so this 
won't be an issue.

  was (Author: brandon.williams):
Right, when CASSANDRA-3740 is resolved, it will get merged into 1.1 so this 
won't be an iss
  
> Wrong Keyspace name is generated while streaming the sstables using 
> BulkOutputFormat.
> -
>
> Key: CASSANDRA-3851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3851
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hadoop, Tools
>Affects Versions: 1.1
>Reporter: Samarth Gahire
>Assignee: Brandon Williams
>Priority: Minor
>  Labels: bulkloader, hadoop, sstableloader
> Fix For: 1.1
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have merge the committed changes of 
> [CASSANDRA-3828|https://issues.apache.org/jira/browse/CASSANDRA-3828] into my 
> cassadra-trunk. Also the changes for the OutputLocation.
> But when I tried to load the sstables with hadoop job it results into the 
> following exception:
> {code}
> 12/02/04 11:19:12 INFO mapred.JobClient:  map 6% reduce 0%
> 12/02/04 11:19:14 INFO mapred.JobClient: Task Id : 
> attempt_201202041114_0001_m_01_1, Status : FAILED
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:252)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:117)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:112)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:182)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:167)
> at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: InvalidRequestException (*why:There is no ring for the keyspace: 
> tmp*)
> at 
> org.apache.cassandra.thrift.Cassandra$describe_ring_result.read(Cassandra.java:24053)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1065)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1052)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:225)
> ... 12 more
> {code}
> After looking into the code I figured out that as we are setting the 
> OUTPUTLOCATION with system property "java.io.tmpdir" the output directory is 
> getting created as: /tmp/Keyspace_Name
> So in SSTableLoader while generating the kespace name like
> {code}
> this.keyspace = directory.getParentFile().getName();
> {code}
> It is setting the keyspace name as "tmp" and results into the above exception.
> I have changed the code as:
> {code}this.keyspace = directory.getName();{code}
> and it works perfect.
> But I am wondering how it was working fine previously? Am I doing anything 
> wrong ? or is it a bug? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3851) Wrong Keyspace name is generated while streaming the sstables using BulkOutputFormat.

2012-02-04 Thread Brandon Williams (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200517#comment-13200517
 ] 

Brandon Williams edited comment on CASSANDRA-3851 at 2/4/12 7:46 PM:
-

Right, when CASSANDRA-3740 is resolved, it will get merged into 1.1 so this 
won't be an iss

  was (Author: brandon.williams):
Right, when CASSANDRA-3828 is resolved, it will get merged into 1.1 so this 
won't be an issue.
  
> Wrong Keyspace name is generated while streaming the sstables using 
> BulkOutputFormat.
> -
>
> Key: CASSANDRA-3851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3851
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hadoop, Tools
>Affects Versions: 1.1
>Reporter: Samarth Gahire
>Assignee: Brandon Williams
>Priority: Minor
>  Labels: bulkloader, hadoop, sstableloader
> Fix For: 1.1
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have merge the committed changes of 
> [CASSANDRA-3828|https://issues.apache.org/jira/browse/CASSANDRA-3828] into my 
> cassadra-trunk. Also the changes for the OutputLocation.
> But when I tried to load the sstables with hadoop job it results into the 
> following exception:
> {code}
> 12/02/04 11:19:12 INFO mapred.JobClient:  map 6% reduce 0%
> 12/02/04 11:19:14 INFO mapred.JobClient: Task Id : 
> attempt_201202041114_0001_m_01_1, Status : FAILED
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:252)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:117)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:112)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:182)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:167)
> at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: InvalidRequestException (*why:There is no ring for the keyspace: 
> tmp*)
> at 
> org.apache.cassandra.thrift.Cassandra$describe_ring_result.read(Cassandra.java:24053)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1065)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1052)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:225)
> ... 12 more
> {code}
> After looking into the code I figured out that as we are setting the 
> OUTPUTLOCATION with system property "java.io.tmpdir" the output directory is 
> getting created as: /tmp/Keyspace_Name
> So in SSTableLoader while generating the kespace name like
> {code}
> this.keyspace = directory.getParentFile().getName();
> {code}
> It is setting the keyspace name as "tmp" and results into the above exception.
> I have changed the code as:
> {code}this.keyspace = directory.getName();{code}
> and it works perfect.
> But I am wondering how it was working fine previously? Am I doing anything 
> wrong ? or is it a bug? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-3851) Wrong Keyspace name is generated while streaming the sstables using BulkOutputFormat.

2012-02-04 Thread Brandon Williams (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams resolved CASSANDRA-3851.
-

Resolution: Not A Problem

Right, when CASSANDRA-3828 is resolved, it will get merged into 1.1 so this 
won't be an issue.

> Wrong Keyspace name is generated while streaming the sstables using 
> BulkOutputFormat.
> -
>
> Key: CASSANDRA-3851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3851
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hadoop, Tools
>Affects Versions: 1.1
>Reporter: Samarth Gahire
>Assignee: Brandon Williams
>Priority: Minor
>  Labels: bulkloader, hadoop, sstableloader
> Fix For: 1.1
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have merge the committed changes of 
> [CASSANDRA-3828|https://issues.apache.org/jira/browse/CASSANDRA-3828] into my 
> cassadra-trunk. Also the changes for the OutputLocation.
> But when I tried to load the sstables with hadoop job it results into the 
> following exception:
> {code}
> 12/02/04 11:19:12 INFO mapred.JobClient:  map 6% reduce 0%
> 12/02/04 11:19:14 INFO mapred.JobClient: Task Id : 
> attempt_201202041114_0001_m_01_1, Status : FAILED
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:252)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:117)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:112)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:182)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:167)
> at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: InvalidRequestException (*why:There is no ring for the keyspace: 
> tmp*)
> at 
> org.apache.cassandra.thrift.Cassandra$describe_ring_result.read(Cassandra.java:24053)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1065)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1052)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:225)
> ... 12 more
> {code}
> After looking into the code I figured out that as we are setting the 
> OUTPUTLOCATION with system property "java.io.tmpdir" the output directory is 
> getting created as: /tmp/Keyspace_Name
> So in SSTableLoader while generating the kespace name like
> {code}
> this.keyspace = directory.getParentFile().getName();
> {code}
> It is setting the keyspace name as "tmp" and results into the above exception.
> I have changed the code as:
> {code}this.keyspace = directory.getName();{code}
> and it works perfect.
> But I am wondering how it was working fine previously? Am I doing anything 
> wrong ? or is it a bug? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3849) Saved CF row cache breaks when upgrading to 1.1

2012-02-04 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3849:
---

Attachment: CASSANDRA-3849.patch

> Saved CF row cache breaks when upgrading to 1.1
> ---
>
> Key: CASSANDRA-3849
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3849
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1
> Environment: 1 node cluster running on branch cassandra-1.0. Ubuntu. 
> both key and row caching were enabled.
>Reporter: Tyler Patterson
>Assignee: Pavel Yaskevich
> Attachments: CASSANDRA-3849.patch
>
>
> Enabled row and key caching. Used stress to insert some data. ran nodetool 
> flush, then nodetool compact. Then read the data back to populate the cache. 
> Turned row_cache_save_period and key_cache_save_period really low to force 
> saving the cache data. I verified that the row and key cache files existed in 
> /var/lib/cassandra/saved_caches/.
> I then killed cassandra, checked out branch cassandra-1.1, compiled and tried 
> to start the node. The node failed to start, and I got this error:
> {code}
>  INFO 01:33:30,893 reading saved cache 
> /var/lib/cassandra/saved_caches/Keyspace1-Standard1-RowCache
> ERROR 01:33:31,009 Exception encountered during startup
> java.lang.AssertionError: Row cache is not enabled on column family 
> [Standard1]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1050)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.initRowCache(ColumnFamilyStore.java:383)
>   at org.apache.cassandra.db.Table.open(Table.java:122)
>   at org.apache.cassandra.db.Table.open(Table.java:100)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
>   at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
> java.lang.AssertionError: Row cache is not enabled on column family 
> [Standard1]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1050)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.initRowCache(ColumnFamilyStore.java:383)
>   at org.apache.cassandra.db.Table.open(Table.java:122)
>   at org.apache.cassandra.db.Table.open(Table.java:100)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
>   at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
> Exception encountered during startup: Row cache is not enabled on column 
> family [Standard1]
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3803) snapshot-before-compaction snapshots entire keyspace

2012-02-04 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200478#comment-13200478
 ] 

Sylvain Lebresne commented on CASSANDRA-3803:
-

I think the attached patch is the wrong one.

> snapshot-before-compaction snapshots entire keyspace
> 
>
> Key: CASSANDRA-3803
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3803
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
>  Labels: compaction
> Fix For: 1.1
>
> Attachments: 3803.txt
>
>
> Should only snapshot the CF being compacted

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

2012-02-04 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200477#comment-13200477
 ] 

Sylvain Lebresne commented on CASSANDRA-3838:
-

Is there any usefulness to set the SO_TIMEOUT on the socket that is writing?

I also wonder if we really should reuse the rpc timeout for this (and my 
initial intuition is that we probably shouldn't). As far as I'm concerned, I'm 
fine adding a new streaming_socket_timeout option for this (we don't even have 
to document it in the yaml if we consider it's an advanced thing).

> Repair Streaming hangs between multiple regions
> ---
>
> Key: CASSANDRA-3838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.7
>Reporter: Vijay
>Assignee: Vijay
>Priority: Minor
> Fix For: 1.0.8
>
> Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons 
> for this, a simple fix will be to add the Socket timeout so the session can 
> retry.
> The following is the netstat of the affected node (the below output remains 
> this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db 
> sections=7002 progress=1523325354/2475291786 - 61%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db 
> sections=6266 progress=0/2190197091 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db 
> sections=7662 progress=0/3082087770 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db 
> sections=7874 progress=0/587439833 - 0%
>/mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x2aaac2060800 nid=0x1676 runnable 
> [0x6be85000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
> at 
> com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
> at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
> at 
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
> - locked <0x0006afea1bd8> (a 
> com.sun.net.ssl.internal.ssl.AppOutputStream)
> at 
> com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
> at 
> com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
> at 
> com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
> at 
> org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
> at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db 
> sections=7231 progress=0/1548922508 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db 
> sections=4730 progress=0/296474156 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db 
> sections=7650 progress=0/1580417610 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db 
> sections=7682 progress=0/196689250 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db 
> sections=7149 progress=0/478695185 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db 
> sections=443 progress=0/78417320 - 0%
>abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>

[jira] [Commented] (CASSANDRA-3819) Cannot restart server after making schema changes to composite CFs

2012-02-04 Thread Huy Le (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200475#comment-13200475
 ] 

Huy Le commented on CASSANDRA-3819:
---

I did not try to reproduce it.  Another user mentioned in this thread 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cannot-start-cassandra-node-anymore-tp7150978p7226863.html
 that it happened every time.

As for me, I have 3 environments.  This issue happened on an environment where 
there where uncommitted logs accumulated before the schema changes. Two other 
environments have committed logs fully flushed before the restart did not 
suffer this issue.  In my environments, there were data inserted into the node 
before and after schema changes.



> Cannot restart server after making schema changes to composite CFs
> --
>
> Key: CASSANDRA-3819
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3819
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.6, 1.0.7
> Environment: Ubuntu 11.0.4
>Reporter: Huy Le
>Assignee: Sylvain Lebresne
> Fix For: 1.0.8
>
>
> This JIRA is for issue discussed in this thread 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cannot-start-cassandra-node-anymore-tp7150978p7150978.html.
> We were using version 1.0.6.  We added new keyspace using built-in composite 
> data type.  We then decided to change the schema, specifically just the CF 
> names, so we dropped the keyspace.  We recreated the key space with different 
> CF names in the key space.
> There were a lot of uncommitted data in commit logs.  Data dated back before 
> the original key space was created.  When we restarted the server, the server 
> failed when it read it the commit logs, and the server stopped.  Here is 
> snippet of the stack trace:
> {code}
> -3881-11e1-ac7f-12313d23ead3:true:4@1326223353559001,])}
> DEBUG 18:02:01,057 Reading mutation at 66336992
> DEBUG 18:02:01,058 replaying mutation for 
> Springpad.696d6167652d7363616c65722d6d657461: 
> {ColumnFamily(CassandraOrderedQueue 
> [0,eb321490-3881-11e1-ac7f-12313d23ead3:true:4@132622335356,])}
> DEBUG 18:02:01,058 Reading mutation at 66337118
> DEBUG 18:02:01,058 replaying mutation for 
> Springpad.737072696e674d6f64656c44617461626173652d6d657461: 
> {ColumnFamily(CassandraOrderedQueue 
> [0,80dc0cd0-3bc0-11e1-83a8-12313d23ead3:false:8@1326223386668000,])}
> DEBUG 18:02:01,058 Reading mutation at 66337255
> DEBUG 18:02:01,058 replaying mutation for 
> system.38363233616337302d336263302d313165312d303030302d323366623834646463346633:
>  {ColumnFamily(Schema 
> [Avro/Schema:false:2725@1326223386807,Backups:false:431@1326223386807,Springpad:false:10814@1326223386807,SpringpadGraph:false:2931@1326223386807,])}
> DEBUG 18:02:01,059 Reading mutation at 66354352
> DEBUG 18:02:01,059 replaying mutation for 
> system.4d6967726174696f6e73204b6579: {ColumnFamily(Migrations 
> [8623ac70-3bc0-11e1--23fb84ddc4f3:false:23728@1326223386812,])}
> DEBUG 18:02:01,059 Reading mutation at 66378184
> DEBUG 18:02:01,059 replaying mutation for 
> system.4c617374204d6967726174696f6e: {ColumnFamily(Schema [Last 
> Migration:false:16@1326223386812,])}
> DEBUG 18:02:01,059 Reading mutation at 66378302
>  INFO 18:02:01,060 Finished reading 
> /mnt/cassandra/commitlog/CommitLog-1325861435420.log
> ERROR 18:02:01,061 Exception encountered during startup
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Buffer.java:247)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:57)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:66)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:129)
> at org.apache.cassandra.db.Column.getString(Column.java:250)
> at 
> org.apache.cassandra.db.marshal.AbstractType.getColumnsString(AbstractType.java:137)
> at 
> org.apache.cassandra.db.ColumnFamily.toString(ColumnFamily.java:280)
> at org.apache.commons.lang.ObjectUtils.toString(ObjectUtils.java:241)
> at org.apache.commons.lang.StringUtils.join(StringUtils.java:3073)
> at org.apache.commons.lang.StringUtils.join(StringUtils.java:3133)
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:301)
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> at 
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:215)
> at 
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
> at 
> org.apache.c

[jira] [Commented] (CASSANDRA-3819) Cannot restart server after making schema changes to composite CFs

2012-02-04 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200473#comment-13200473
 ] 

Sylvain Lebresne commented on CASSANDRA-3819:
-

You said this happens every time. Would you have exact steps to reproduce on a 
clean database. Tried a quick 'start fresh node, create CF with compositeType, 
insert a column, drop the CF, stop the node, restart the node', but that didn't 
reproduced it.

> Cannot restart server after making schema changes to composite CFs
> --
>
> Key: CASSANDRA-3819
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3819
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.6, 1.0.7
> Environment: Ubuntu 11.0.4
>Reporter: Huy Le
>Assignee: Sylvain Lebresne
> Fix For: 1.0.8
>
>
> This JIRA is for issue discussed in this thread 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cannot-start-cassandra-node-anymore-tp7150978p7150978.html.
> We were using version 1.0.6.  We added new keyspace using built-in composite 
> data type.  We then decided to change the schema, specifically just the CF 
> names, so we dropped the keyspace.  We recreated the key space with different 
> CF names in the key space.
> There were a lot of uncommitted data in commit logs.  Data dated back before 
> the original key space was created.  When we restarted the server, the server 
> failed when it read it the commit logs, and the server stopped.  Here is 
> snippet of the stack trace:
> {code}
> -3881-11e1-ac7f-12313d23ead3:true:4@1326223353559001,])}
> DEBUG 18:02:01,057 Reading mutation at 66336992
> DEBUG 18:02:01,058 replaying mutation for 
> Springpad.696d6167652d7363616c65722d6d657461: 
> {ColumnFamily(CassandraOrderedQueue 
> [0,eb321490-3881-11e1-ac7f-12313d23ead3:true:4@132622335356,])}
> DEBUG 18:02:01,058 Reading mutation at 66337118
> DEBUG 18:02:01,058 replaying mutation for 
> Springpad.737072696e674d6f64656c44617461626173652d6d657461: 
> {ColumnFamily(CassandraOrderedQueue 
> [0,80dc0cd0-3bc0-11e1-83a8-12313d23ead3:false:8@1326223386668000,])}
> DEBUG 18:02:01,058 Reading mutation at 66337255
> DEBUG 18:02:01,058 replaying mutation for 
> system.38363233616337302d336263302d313165312d303030302d323366623834646463346633:
>  {ColumnFamily(Schema 
> [Avro/Schema:false:2725@1326223386807,Backups:false:431@1326223386807,Springpad:false:10814@1326223386807,SpringpadGraph:false:2931@1326223386807,])}
> DEBUG 18:02:01,059 Reading mutation at 66354352
> DEBUG 18:02:01,059 replaying mutation for 
> system.4d6967726174696f6e73204b6579: {ColumnFamily(Migrations 
> [8623ac70-3bc0-11e1--23fb84ddc4f3:false:23728@1326223386812,])}
> DEBUG 18:02:01,059 Reading mutation at 66378184
> DEBUG 18:02:01,059 replaying mutation for 
> system.4c617374204d6967726174696f6e: {ColumnFamily(Schema [Last 
> Migration:false:16@1326223386812,])}
> DEBUG 18:02:01,059 Reading mutation at 66378302
>  INFO 18:02:01,060 Finished reading 
> /mnt/cassandra/commitlog/CommitLog-1325861435420.log
> ERROR 18:02:01,061 Exception encountered during startup
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Buffer.java:247)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:57)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:66)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:129)
> at org.apache.cassandra.db.Column.getString(Column.java:250)
> at 
> org.apache.cassandra.db.marshal.AbstractType.getColumnsString(AbstractType.java:137)
> at 
> org.apache.cassandra.db.ColumnFamily.toString(ColumnFamily.java:280)
> at org.apache.commons.lang.ObjectUtils.toString(ObjectUtils.java:241)
> at org.apache.commons.lang.StringUtils.join(StringUtils.java:3073)
> at org.apache.commons.lang.StringUtils.join(StringUtils.java:3133)
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:301)
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> at 
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:215)
> at 
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
> at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
> Exception encountered during startup: null 
> {code}
> Sample original CF schema:
> {code}
> create column family InEdges
>   with column_type = 'Standard'
>   and comparator = 
> 'CompositeType(org.apache.cassandra.db.marshal.LongTyp

[jira] [Updated] (CASSANDRA-3849) Saved CF row cache breaks when upgrading to 1.1

2012-02-04 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3849:
---

Reviewer: slebresne
Assignee: Pavel Yaskevich  (was: Sylvain Lebresne)

> Saved CF row cache breaks when upgrading to 1.1
> ---
>
> Key: CASSANDRA-3849
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3849
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1
> Environment: 1 node cluster running on branch cassandra-1.0. Ubuntu. 
> both key and row caching were enabled.
>Reporter: Tyler Patterson
>Assignee: Pavel Yaskevich
>
> Enabled row and key caching. Used stress to insert some data. ran nodetool 
> flush, then nodetool compact. Then read the data back to populate the cache. 
> Turned row_cache_save_period and key_cache_save_period really low to force 
> saving the cache data. I verified that the row and key cache files existed in 
> /var/lib/cassandra/saved_caches/.
> I then killed cassandra, checked out branch cassandra-1.1, compiled and tried 
> to start the node. The node failed to start, and I got this error:
> {code}
>  INFO 01:33:30,893 reading saved cache 
> /var/lib/cassandra/saved_caches/Keyspace1-Standard1-RowCache
> ERROR 01:33:31,009 Exception encountered during startup
> java.lang.AssertionError: Row cache is not enabled on column family 
> [Standard1]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1050)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.initRowCache(ColumnFamilyStore.java:383)
>   at org.apache.cassandra.db.Table.open(Table.java:122)
>   at org.apache.cassandra.db.Table.open(Table.java:100)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
>   at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
> java.lang.AssertionError: Row cache is not enabled on column family 
> [Standard1]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1050)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.initRowCache(ColumnFamilyStore.java:383)
>   at org.apache.cassandra.db.Table.open(Table.java:122)
>   at org.apache.cassandra.db.Table.open(Table.java:100)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
>   at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
> Exception encountered during startup: Row cache is not enabled on column 
> family [Standard1]
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-3849) Saved CF row cache breaks when upgrading to 1.1

2012-02-04 Thread Jonathan Ellis (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-3849:
-

Assignee: Sylvain Lebresne

> Saved CF row cache breaks when upgrading to 1.1
> ---
>
> Key: CASSANDRA-3849
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3849
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1
> Environment: 1 node cluster running on branch cassandra-1.0. Ubuntu. 
> both key and row caching were enabled.
>Reporter: Tyler Patterson
>Assignee: Sylvain Lebresne
>
> Enabled row and key caching. Used stress to insert some data. ran nodetool 
> flush, then nodetool compact. Then read the data back to populate the cache. 
> Turned row_cache_save_period and key_cache_save_period really low to force 
> saving the cache data. I verified that the row and key cache files existed in 
> /var/lib/cassandra/saved_caches/.
> I then killed cassandra, checked out branch cassandra-1.1, compiled and tried 
> to start the node. The node failed to start, and I got this error:
> {code}
>  INFO 01:33:30,893 reading saved cache 
> /var/lib/cassandra/saved_caches/Keyspace1-Standard1-RowCache
> ERROR 01:33:31,009 Exception encountered during startup
> java.lang.AssertionError: Row cache is not enabled on column family 
> [Standard1]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1050)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.initRowCache(ColumnFamilyStore.java:383)
>   at org.apache.cassandra.db.Table.open(Table.java:122)
>   at org.apache.cassandra.db.Table.open(Table.java:100)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
>   at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
> java.lang.AssertionError: Row cache is not enabled on column family 
> [Standard1]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1050)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.initRowCache(ColumnFamilyStore.java:383)
>   at org.apache.cassandra.db.Table.open(Table.java:122)
>   at org.apache.cassandra.db.Table.open(Table.java:100)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204)
>   at 
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
>   at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
> Exception encountered during startup: Row cache is not enabled on column 
> family [Standard1]
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3831) scaling to large clusters in GossipStage impossible due to calculatePendingRanges

2012-02-04 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200464#comment-13200464
 ] 

Jonathan Ellis commented on CASSANDRA-3831:
---

calculatePendingRanges is only supposed to be called when the ring changes.  So 
I'd say the right fix would be to eliminate whatever is breaking that design, 
rather than adding a memoization bandaid.

(I eyeballed 1.1 and didn't see anything obvious, so either it's subtle or it 
got fixed post-0.8.)

I don't suppose your CPU spinning test got any more of a call tree to go on?

> scaling to large clusters in GossipStage impossible due to 
> calculatePendingRanges 
> --
>
> Key: CASSANDRA-3831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3831
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Peter Schuller
>Assignee: Peter Schuller
>Priority: Critical
> Attachments: CASSANDRA-3831-memoization-not-for-inclusion.txt
>
>
> (most observations below are from 0.8, but I just now tested on
> trunk and I can trigger this problem *just* by bootstrapping a ~180
> nod cluster concurrently, presumably due to the number of nodes that
> are simultaneously in bootstrap state)
> It turns out that:
> * (1) calculatePendingRanges is not just expensive, it's computationally 
> complex - cubic or worse
> * (2) it gets called *NOT* just once per node being bootstrapped/leaving etc, 
> but is called repeatedly *while* nodes are in these states
> As a result, clusters start exploding when you start reading 100-300
> nodes. The GossipStage will get backed up because a single
> calculdatePenginRanges takes seconds, and depending on what the
> average heartbeat interval is in relation to this, this can lead to
> *massive* cluster-wide flapping.
> This all started because we hit this in production; several nodes
> would start flapping several other nodes as down, with many nodes
> seeing the entire cluster, or a large portion of it, as down. Logging
> in to some of these nodes you would see that they would be constantly
> flapping up/down for minutes at a time until one became lucky and it
> stabilized.
> In the end we had to perform an emergency full-cluster restart with
> gossip patched to force-forget certain nodes in bootstrapping state.
> I can't go into all details here from the post-mortem (just the
> write-up would take a day), but in short:
> * We graphed the number of hosts in the cluster that had more than 5
>   Down (in a cluster that should have 0 down) on a minutely timeline.
> * We also graphed the number of hosts in the cluster that had GossipStage 
> backed up.
> * The two graphs correlated *extremely* well
> * jstack sampling showed it being CPU bound doing mostly sorting under 
> calculatePendingRanges
> * We were never able to exactly reproduce it with normal RING_DELAY and 
> gossip intervals, even on a 184 node cluster (the production cluster is 
> around 180).
> * Dropping RING_DELAY and in particular dropping gossip interval to 10 ms 
> instead of 1000 ms, we were able to observe all of the behavior we saw in 
> production.
> So our steps to reproduce are:
> * Launch 184 node cluster w/ gossip interval at 10ms and RING_DELAY at 1 
> second.
> * Do something like: {{while [ 1 ] ; do date ; echo decom ; nodetool 
> decommission ; date ; echo done leaving decommed for a while ; sleep 3 ; date 
> ; echo done restarting; sudo rm -rf /data/disk1/commitlog/* ; sudo rm -rf 
> /data/diskarray/tables/* ; sudo monit restart cassandra ;date ; echo 
> restarted waiting for a while ; sleep 40; done}} (or just do a manual 
> decom/bootstrap once, it triggers every time)
> * Watch all nodes flap massively and not recover at all, or maybe after a 
> *long* time.
> I observed the flapping using a python script that every 5 second
> (randomly spread out) asked for unreachable nodes from *all* nodes in
> the cluster, and printed any nodes and their counts when they had
> unreachables > 5. The cluster can be observed instantly going into
> massive flapping when leaving/bootstrap is initiated. Script needs
> Cassandra running with Jolokia enabled for http/json access to
> JMX. Can provide scrit if needed after cleanup.
> The phi conviction, based on logging I added, was legitimate. Using
> the 10 ms interval the average heartbeat interval ends up being like 25
> ms or something like that. As a result, a single ~ 2 second delay in
> gossip stage is huge in comparison to those 25 ms, and so we go past
> the phi conviction threshold. This is much more sensitive than in
> production, but it's the *same* effect, even if it triggers less
> easily for real.
> The best work around currently internally is to memoize
> calculatePendingRanges so that w

[jira] [Updated] (CASSANDRA-3846) cqlsh can't show data under python2.5, python2.6

2012-02-04 Thread Jonathan Ellis (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3846:
--

Reviewer: brandon.williams

> cqlsh can't show data under python2.5, python2.6
> 
>
> Key: CASSANDRA-3846
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3846
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: paul cannon
>Assignee: paul cannon
>Priority: Minor
>  Labels: cqlsh
> Fix For: 1.0.8
>
> Attachments: 3846.patch.txt
>
>
> Kris Hahn discovered a python2.6-ism in recent cqlsh changes:
> {code}
> bval = escapedval.encode(output_encoding, errors='backslashreplace')
> {code}
> before python2.7, str.encode() didn't accept a keyword argument for the 
> second parameter. the semantics are the same without naming the parameter, 
> though, so removing the "errors=" bit should suffice to make it run right.
> does not affect any released version, but does affect HEAD of cassandra-1.0, 
> cassandra-1.1, and trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3845) AssertionError in ExpiringMap during inserts

2012-02-04 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200463#comment-13200463
 ] 

Jonathan Ellis commented on CASSANDRA-3845:
---

Well, that sucks.

Looks like you need to report this to the high-scale-lib project.

> AssertionError in ExpiringMap during inserts
> 
>
> Key: CASSANDRA-3845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1
>Reporter: Brandon Williams
>
> {noformat}
> ERROR 20:34:43,981 Fatal exception in thread Thread[Thrift:17,5,main]
> java.lang.AssertionError
> at 
> org.cliffc.high_scale_lib.NonBlockingHashMap$CHM.copy_check_and_promote(NonBlockingHashMap.java:982)
> at 
> org.cliffc.high_scale_lib.NonBlockingHashMap$CHM.help_copy_impl(NonBlockingHashMap.java:936)
> at 
> org.cliffc.high_scale_lib.NonBlockingHashMap$CHM.access$500(NonBlockingHashMap.java:707)
> at 
> org.cliffc.high_scale_lib.NonBlockingHashMap.help_copy(NonBlockingHashMap.java:700)
> at 
> org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:613)
> at 
> org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:348)
> at 
> org.cliffc.high_scale_lib.NonBlockingHashMap.put(NonBlockingHashMap.java:311)
> at org.apache.cassandra.utils.ExpiringMap.put(ExpiringMap.java:152)
> at 
> org.apache.cassandra.net.MessagingService.addCallback(MessagingService.java:354)
> at 
> org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:392)
> at 
> org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:374)
> at 
> org.apache.cassandra.service.StorageProxy.sendMessages(StorageProxy.java:410)
> at 
> org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:339)
> at 
> org.apache.cassandra.service.StorageProxy$2.apply(StorageProxy.java:120)
> at 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:255)
> at 
> org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:194)
> at 
> org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:638)
> at 
> org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:589)
> at 
> org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:597)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3112)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3100)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-3850) get_indexed_slices losts index expressions

2012-02-04 Thread Sylvain Lebresne (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-3850.
-

   Resolution: Fixed
Fix Version/s: 1.1
 Reviewer: slebresne

Committed, thanks.

(don't hesitate to attach the patch to the issue next time, it's slightly more 
convenient :))

> get_indexed_slices losts index expressions
> --
>
> Key: CASSANDRA-3850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3850
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Andronov
>  Labels: get, indexing, search
> Fix For: 1.1
>
>
> in trunk 
> CassandraServer.get_indexed_slices(ColumnParent , IndexClause , 
> SlicePredicate , ConsistencyLevel)
>  looses  index_clause.expressions when calling  constructing 
> RangeSliceCommand by using wrong constructor.
> This makes examples on http://wiki.apache.org/cassandra/CassandraCli produce 
> wrong output as well as any get involving "where" check.
> Patch to fix this issue http://pastebin.com/QQT0Tfpc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[1/2] git commit: Merge branch 'cassandra-1.1' into trunk

2012-02-04 Thread slebresne
Updated Branches:
  refs/heads/trunk e12d4430e -> 0ba454168


Merge branch 'cassandra-1.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0ba45416
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0ba45416
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0ba45416

Branch: refs/heads/trunk
Commit: 0ba4541685640043a4e555d049261e36ca8d8e3b
Parents: e12d443 cbac7af
Author: Sylvain Lebresne 
Authored: Sat Feb 4 15:26:58 2012 +0100
Committer: Sylvain Lebresne 
Committed: Sat Feb 4 15:26:58 2012 +0100

--
 .../apache/cassandra/thrift/CassandraServer.java   |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
--




[2/2] git commit: Don't ignore index expressions in get_indexed_slices

2012-02-04 Thread slebresne
Don't ignore index expressions in get_indexed_slices

patch by Philip Andronov; reviewed by slebresne for CASSANDRA-3850


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/cbac7af7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/cbac7af7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/cbac7af7

Branch: refs/heads/trunk
Commit: cbac7af793f8a726f051d6dbf36a382fac595c45
Parents: 35aad40
Author: Sylvain Lebresne 
Authored: Sat Feb 4 15:23:45 2012 +0100
Committer: Sylvain Lebresne 
Committed: Sat Feb 4 15:26:16 2012 +0100

--
 .../apache/cassandra/thrift/CassandraServer.java   |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/cbac7af7/src/java/org/apache/cassandra/thrift/CassandraServer.java
--
diff --git a/src/java/org/apache/cassandra/thrift/CassandraServer.java 
b/src/java/org/apache/cassandra/thrift/CassandraServer.java
index 03e6728..f30a130 100644
--- a/src/java/org/apache/cassandra/thrift/CassandraServer.java
+++ b/src/java/org/apache/cassandra/thrift/CassandraServer.java
@@ -802,6 +802,7 @@ public class CassandraServer implements Cassandra.Iface
   null,
   column_predicate,
   bounds,
+  
index_clause.expressions,
   index_clause.count);
 
 List rows;



git commit: Don't ignore index expressions in get_indexed_slices

2012-02-04 Thread slebresne
Updated Branches:
  refs/heads/cassandra-1.1 35aad40b0 -> cbac7af79


Don't ignore index expressions in get_indexed_slices

patch by Philip Andronov; reviewed by slebresne for CASSANDRA-3850


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/cbac7af7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/cbac7af7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/cbac7af7

Branch: refs/heads/cassandra-1.1
Commit: cbac7af793f8a726f051d6dbf36a382fac595c45
Parents: 35aad40
Author: Sylvain Lebresne 
Authored: Sat Feb 4 15:23:45 2012 +0100
Committer: Sylvain Lebresne 
Committed: Sat Feb 4 15:26:16 2012 +0100

--
 .../apache/cassandra/thrift/CassandraServer.java   |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/cbac7af7/src/java/org/apache/cassandra/thrift/CassandraServer.java
--
diff --git a/src/java/org/apache/cassandra/thrift/CassandraServer.java 
b/src/java/org/apache/cassandra/thrift/CassandraServer.java
index 03e6728..f30a130 100644
--- a/src/java/org/apache/cassandra/thrift/CassandraServer.java
+++ b/src/java/org/apache/cassandra/thrift/CassandraServer.java
@@ -802,6 +802,7 @@ public class CassandraServer implements Cassandra.Iface
   null,
   column_predicate,
   bounds,
+  
index_clause.expressions,
   index_clause.count);
 
 List rows;



[jira] [Commented] (CASSANDRA-3851) Wrong Keyspace name is generated while streaming the sstables using BulkOutputFormat.

2012-02-04 Thread Samarth Gahire (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200442#comment-13200442
 ] 

Samarth Gahire commented on CASSANDRA-3851:
---

Got It
Actually in cassandra-trunk we are handling it as 
{code}
File outputdir = new File(getOutputLocation() + File.separator + keyspace + 
File.separator + ConfigHelper.getOutputColumnFamily(conf)); //dir must be named 
by ks/cf for the loader
{code}
That is the reason it is creating the keyspace name properly.
So Its a bug in cassandra-1.1.

> Wrong Keyspace name is generated while streaming the sstables using 
> BulkOutputFormat.
> -
>
> Key: CASSANDRA-3851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3851
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hadoop, Tools
>Affects Versions: 1.1
>Reporter: Samarth Gahire
>Assignee: Brandon Williams
>Priority: Minor
>  Labels: bulkloader, hadoop, sstableloader
> Fix For: 1.1
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have merge the committed changes of 
> [CASSANDRA-3828|https://issues.apache.org/jira/browse/CASSANDRA-3828] into my 
> cassadra-trunk. Also the changes for the OutputLocation.
> But when I tried to load the sstables with hadoop job it results into the 
> following exception:
> {code}
> 12/02/04 11:19:12 INFO mapred.JobClient:  map 6% reduce 0%
> 12/02/04 11:19:14 INFO mapred.JobClient: Task Id : 
> attempt_201202041114_0001_m_01_1, Status : FAILED
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:252)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:117)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:112)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:182)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:167)
> at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: InvalidRequestException (*why:There is no ring for the keyspace: 
> tmp*)
> at 
> org.apache.cassandra.thrift.Cassandra$describe_ring_result.read(Cassandra.java:24053)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1065)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1052)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:225)
> ... 12 more
> {code}
> After looking into the code I figured out that as we are setting the 
> OUTPUTLOCATION with system property "java.io.tmpdir" the output directory is 
> getting created as: /tmp/Keyspace_Name
> So in SSTableLoader while generating the kespace name like
> {code}
> this.keyspace = directory.getParentFile().getName();
> {code}
> It is setting the keyspace name as "tmp" and results into the above exception.
> I have changed the code as:
> {code}this.keyspace = directory.getName();{code}
> and it works perfect.
> But I am wondering how it was working fine previously? Am I doing anything 
> wrong ? or is it a bug? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-3851) Wrong Keyspace name is generated while streaming the sstables using BulkOutputFormat.

2012-02-04 Thread Samarth Gahire (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Gahire reassigned CASSANDRA-3851:
-

Assignee: Brandon Williams  (was: Samarth Gahire)

> Wrong Keyspace name is generated while streaming the sstables using 
> BulkOutputFormat.
> -
>
> Key: CASSANDRA-3851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3851
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hadoop, Tools
>Affects Versions: 1.1
>Reporter: Samarth Gahire
>Assignee: Brandon Williams
>Priority: Minor
>  Labels: bulkloader, hadoop, sstableloader
> Fix For: 1.1
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have merge the committed changes of 
> [CASSANDRA-3828|https://issues.apache.org/jira/browse/CASSANDRA-3828] into my 
> cassadra-trunk. Also the changes for the OutputLocation.
> But when I tried to load the sstables with hadoop job it results into the 
> following exception:
> {code}
> 12/02/04 11:19:12 INFO mapred.JobClient:  map 6% reduce 0%
> 12/02/04 11:19:14 INFO mapred.JobClient: Task Id : 
> attempt_201202041114_0001_m_01_1, Status : FAILED
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:252)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:117)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:112)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:182)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:167)
> at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: InvalidRequestException (*why:There is no ring for the keyspace: 
> tmp*)
> at 
> org.apache.cassandra.thrift.Cassandra$describe_ring_result.read(Cassandra.java:24053)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1065)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1052)
> at 
> org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:225)
> ... 12 more
> {code}
> After looking into the code I figured out that as we are setting the 
> OUTPUTLOCATION with system property "java.io.tmpdir" the output directory is 
> getting created as: /tmp/Keyspace_Name
> So in SSTableLoader while generating the kespace name like
> {code}
> this.keyspace = directory.getParentFile().getName();
> {code}
> It is setting the keyspace name as "tmp" and results into the above exception.
> I have changed the code as:
> {code}this.keyspace = directory.getName();{code}
> and it works perfect.
> But I am wondering how it was working fine previously? Am I doing anything 
> wrong ? or is it a bug? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3851) Wrong Keyspace name is generated while streaming the sstables using BulkOutputFormat.

2012-02-04 Thread Samarth Gahire (Created) (JIRA)
Wrong Keyspace name is generated while streaming the sstables using 
BulkOutputFormat.
-

 Key: CASSANDRA-3851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3851
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop, Tools
Affects Versions: 1.1
Reporter: Samarth Gahire
Assignee: Samarth Gahire
Priority: Minor
 Fix For: 1.1


I have merge the committed changes of 
[CASSANDRA-3828|https://issues.apache.org/jira/browse/CASSANDRA-3828] into my 
cassadra-trunk. Also the changes for the OutputLocation.
But when I tried to load the sstables with hadoop job it results into the 
following exception:
{code}
12/02/04 11:19:12 INFO mapred.JobClient:  map 6% reduce 0%
12/02/04 11:19:14 INFO mapred.JobClient: Task Id : 
attempt_201202041114_0001_m_01_1, Status : FAILED
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at 
org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:252)
at 
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:117)
at 
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:112)
at 
org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:182)
at 
org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:167)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: InvalidRequestException (*why:There is no ring for the keyspace: 
tmp*)
at 
org.apache.cassandra.thrift.Cassandra$describe_ring_result.read(Cassandra.java:24053)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1065)
at 
org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1052)
at 
org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:225)
... 12 more
{code}

After looking into the code I figured out that as we are setting the 
OUTPUTLOCATION with system property "java.io.tmpdir" the output directory is 
getting created as: /tmp/Keyspace_Name
So in SSTableLoader while generating the kespace name like
{code}
this.keyspace = directory.getParentFile().getName();
{code}

It is setting the keyspace name as "tmp" and results into the above exception.

I have changed the code as:
{code}this.keyspace = directory.getName();{code}
and it works perfect.
But I am wondering how it was working fine previously? Am I doing anything 
wrong ? or is it a bug? 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3791) Support query by names for compact CF

2012-02-04 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200380#comment-13200380
 ] 

Sylvain Lebresne commented on CASSANDRA-3791:
-

It's not really important but for info, the test seems to fail here with
{noformat}
  File "/Users/mcmanus/Git/scqeal/scqeal/tests/test_select.py", line 370, in 
compactcf_query_by_names_test
self.assertIn(1328129600682, list(column_values(rows)))
AttributeError: 'TestSelect' object has no attribute 'assertIn'
{noformat}

However, I look at the queries and result on that test and I didn't see 
anything wrong on what's returned. However, the test is asserting wrong things. 
When inserting data in the 'Clicks' CF, for the second userid 
(d9c9dced-d539-4bce-9ccb-dc59a9e9136f), you update the same column multiple 
times. The inserts look like:
{noformat}
INSERT INTO Clicks (userid, url, time) VALUES 
(d9c9dced-d539-4bce-9ccb-dc59a9e9136f, 'http://apache.org', 1328129787426);
INSERT INTO Clicks (userid, url, time) VALUES 
(d9c9dced-d539-4bce-9ccb-dc59a9e9136f, 'http://apache.org', 1328129787427);
INSERT INTO Clicks (userid, url, time) VALUES 
(d9c9dced-d539-4bce-9ccb-dc59a9e9136f, 'http://apache.org', 1328129787428);
INSERT INTO Clicks (userid, url, time) VALUES 
(d9c9dced-d539-4bce-9ccb-dc59a9e9136f, 'http://apache.org', 1328129787429);
{noformat}

Because the PK is (userid, url), those lines are actually over-writing the same 
column over and over again. Hence when the test do a select and expect 7 rows 
in return, it only gets 2, but that's the right behavior (and as far as I can 
tell, the IN work correctly in that test).

> Support query by names for compact CF
> -
>
> Key: CASSANDRA-3791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3791
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Minor
>  Labels: cql3
> Fix For: 1.1
>
> Attachments: 0001-Refactor-select.patch, 
> 0002-Allow-IN-on-last-column-of-PRIMARY-KEY.patch
>
>
> Current code don't allow doing a query by names on wide rows (compact CF). 
> I.e. with:
> {noformat}
> CREATE TABLE test1 (
> k int,
> c int,
> v int,
> PRIMARY KEY (k, c)
> ) WITH COMPACT STORAGE;
> {noformat}
> you cannot do:
> {noformat}
> SELECT v FROM test1 WHERE k = 0 AND c IN (5, 2, 8)
> {noformat}
> even though this is a simple name query.
> This ticket proposes to allow it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3850) get_indexed_slices losts index expressions

2012-02-04 Thread Philip Andronov (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Andronov updated CASSANDRA-3850:
---

Description: 
in trunk 
CassandraServer.get_indexed_slices(ColumnParent , IndexClause , SlicePredicate 
, ConsistencyLevel)
 looses  index_clause.expressions when calling  constructing RangeSliceCommand 
by using wrong constructor.

This makes examples on http://wiki.apache.org/cassandra/CassandraCli produce 
wrong output as well as any get involving "where" check.
Patch to fix this issue http://pastebin.com/QQT0Tfpc

  was:
in trunk 
get_indexed_slices(ColumnParent column_parent, IndexClause index_clause, 
SlicePredicate column_predicate, ConsistencyLevel consistency_level)
 looses  index_clause.expressions when calling  RangeSliceCommand command = new 
RangeSliceCommand.
It uses wrong constructor.

This makes examples on http://wiki.apache.org/cassandra/CassandraCli produce 
wrong output as well as any get involving "where" check.
Patch to fix this issue http://pastebin.com/QQT0Tfpc


> get_indexed_slices losts index expressions
> --
>
> Key: CASSANDRA-3850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3850
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Andronov
>  Labels: get, indexing, search
>
> in trunk 
> CassandraServer.get_indexed_slices(ColumnParent , IndexClause , 
> SlicePredicate , ConsistencyLevel)
>  looses  index_clause.expressions when calling  constructing 
> RangeSliceCommand by using wrong constructor.
> This makes examples on http://wiki.apache.org/cassandra/CassandraCli produce 
> wrong output as well as any get involving "where" check.
> Patch to fix this issue http://pastebin.com/QQT0Tfpc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3850) get_indexed_slices losts index expressions

2012-02-04 Thread Philip Andronov (Created) (JIRA)
get_indexed_slices losts index expressions
--

 Key: CASSANDRA-3850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3850
 Project: Cassandra
  Issue Type: Bug
Reporter: Philip Andronov


in trunk 
get_indexed_slices(ColumnParent column_parent, IndexClause index_clause, 
SlicePredicate column_predicate, ConsistencyLevel consistency_level)
 looses  index_clause.expressions when calling  RangeSliceCommand command = new 
RangeSliceCommand.
It uses wrong constructor.

This makes examples on http://wiki.apache.org/cassandra/CassandraCli produce 
wrong output as well as any get involving "where" check.
Patch to fix this issue http://pastebin.com/QQT0Tfpc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira