Re: idea drive layout - 4 drives + RAID question

2012-11-01 Thread Ran User
Thanks.  Yep, I think OS + CL (2 drive RAID1) will provide the best balance
of reduced headaches / performance.  I'll also be pondering 1 drive OS, 1
drive CL as well.
On Wed, Oct 31, 2012 at 9:27 PM, aaron morton aa...@thelastpickle.comwrote:

 Good question.

 The is a comment on the DS blog or docs somewhere that says on EC2 running
 the commit log on the raid-0 ephemeral is preferred. I think the
 recommendation was specifically about how the disks are setup on EC2.

 While the commit log will be competing with logs and everything else on
 the OS volume, it would be competing with C* reads, Memtable flushing,
 compacting and repairing on the data volume.

 The only way to be sure is to test both setups.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/10/2012, at 1:11 PM, Ran User ranuse...@gmail.com wrote:

 Is there a concern of a large falloff in commit log write performance
 (sequential) when sharing 2 drives (RAID 1) with the OS (os and services
 writing their own logs, etc)?  Do you expect the hit to be marginal?


 On Tue, Oct 30, 2012 at 7:58 PM, aaron morton aa...@thelastpickle.comwrote:

 We also have 4-disk nodes, and we use the following layout:
 2 x OS + Commit in RAID 1
 2 x Data disk in RAID 0

 +1

 You are replicating data at the application level and want the fastest
 possible IO performance per node.

  You can already distribute the
 individual Cassandra column families on different drives by just
 setting up symlinks to the individual folders.

 There are some features coming in 1.2 that make using a JBOD setup
 easier.

 Cheers

  -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/10/2012, at 9:23 PM, Pieter Callewaert 
 pieter.callewa...@be-mobile.be wrote:

 We also have 4-disk nodes, and we use the following layout:
 2 x OS + Commit in RAID 1
 2 x Data disk in RAID 0

 This gives us the advantage we never have to reinstall the node when a
 drive crashes.

 Kind regards,
 Pieter


 *From:* Ran User [mailto:ranuse...@gmail.com]
 *Sent:* dinsdag 30 oktober 2012 4:33
 *To:* user@cassandra.apache.org
 *Subject:* Re: idea drive layout - 4 drives + RAID question

 Have you considered running RAID 10 for the data drives to improve MTBF?
 
  
 On one hand Cassandra is handling redundancy issues, on the other
 hand, reducing the frequency of dealing with failed nodes
 is attractive if cheap (switching RAID levels to 10). 
  

 We have no experience with software RAID (have always used hardware raid
 with BBU).  I'm assuming software RAID 1 or 10 (the mirroring part) is
 inherently reliable (perhaps minus some edge case).
 On Tue, Oct 30, 2012 at 1:07 AM, Tupshin Harper tups...@tupshin.com
 wrote:

 I would generally recommend 1 drive for OS and commit log and 3 drive
 raid 0 for data. The raid does give you good performance benefit, and it
 can be convenient to have the OS on a side drive for configuration ease and
 better MTBF.

 -Tupshin
 On Oct 29, 2012 8:56 PM, Ran User ranuse...@gmail.com wrote:
 I was hoping to achieve approx. 2x IO (write and read) performance via
 RAID 0 (by accepting a higher MTBF).
  
 Do believe the performance gains of RAID0 are much lower and/or are not
 worth it vs the increased server failure rate?
  
 From my understanding, RAID 10 would achieve the read performance
 benefits of RAID 0, but not the write benefits.  I'm also considering RAID
 10 to maximize server IO performance. 
  
 Currently, we're working with 1 CF.
  
  

 Thank you
 On Mon, Oct 29, 2012 at 11:51 PM, Timmy Turner timm.t...@gmail.com
 wrote:
 I'm not sure whether the raid 0 gets you anything other than headaches
 should one of the drives fail. You can already distribute the
 individual Cassandra column families on different drives by just
 setting up symlinks to the individual folders.

 2012/10/30 Ran User ranuse...@gmail.com:
  For a server with 4 drive slots only, I'm thinking:
 
  either:
 
  - OS (1 drive)
  - Commit Log (1 drive)
  - Data (2 drives, software raid 0)
 
  vs
 
  - OS  + Data (3 drives, software raid 0)
  - Commit Log (1 drive)
 
  or something else?
 
  also, if I can spare the wasted storage, would RAID 10 for cassandra
 data
  improve read performance and have no effect on write performance?
 
  Thank you!
 ** **







Re: idea drive layout - 4 drives + RAID question

2012-10-30 Thread Ran User
Is there a concern of a large falloff in commit log write performance
(sequential) when sharing 2 drives (RAID 1) with the OS (os and services
writing their own logs, etc)?  Do you expect the hit to be marginal?


On Tue, Oct 30, 2012 at 7:58 PM, aaron morton aa...@thelastpickle.comwrote:

 We also have 4-disk nodes, and we use the following layout:
 2 x OS + Commit in RAID 1
 2 x Data disk in RAID 0

 +1

 You are replicating data at the application level and want the fastest
 possible IO performance per node.

  You can already distribute the
 individual Cassandra column families on different drives by just
 setting up symlinks to the individual folders.

 There are some features coming in 1.2 that make using a JBOD setup easier.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/10/2012, at 9:23 PM, Pieter Callewaert 
 pieter.callewa...@be-mobile.be wrote:

 We also have 4-disk nodes, and we use the following layout:
 2 x OS + Commit in RAID 1
 2 x Data disk in RAID 0

 This gives us the advantage we never have to reinstall the node when a
 drive crashes.

 Kind regards,
 Pieter


 *From:* Ran User [mailto:ranuse...@gmail.com]
 *Sent:* dinsdag 30 oktober 2012 4:33
 *To:* user@cassandra.apache.org
 *Subject:* Re: idea drive layout - 4 drives + RAID question

 Have you considered running RAID 10 for the data drives to improve MTBF?
 
  
 On one hand Cassandra is handling redundancy issues, on the other
 hand, reducing the frequency of dealing with failed nodes
 is attractive if cheap (switching RAID levels to 10). 
  

 We have no experience with software RAID (have always used hardware raid
 with BBU).  I'm assuming software RAID 1 or 10 (the mirroring part) is
 inherently reliable (perhaps minus some edge case).
 On Tue, Oct 30, 2012 at 1:07 AM, Tupshin Harper tups...@tupshin.com
 wrote:

 I would generally recommend 1 drive for OS and commit log and 3 drive raid
 0 for data. The raid does give you good performance benefit, and it can be
 convenient to have the OS on a side drive for configuration ease and better
 MTBF.

 -Tupshin
 On Oct 29, 2012 8:56 PM, Ran User ranuse...@gmail.com wrote:
 I was hoping to achieve approx. 2x IO (write and read) performance via
 RAID 0 (by accepting a higher MTBF).
  
 Do believe the performance gains of RAID0 are much lower and/or are not
 worth it vs the increased server failure rate?
  
 From my understanding, RAID 10 would achieve the read performance benefits
 of RAID 0, but not the write benefits.  I'm also considering RAID 10 to
 maximize server IO performance. 
  
 Currently, we're working with 1 CF.
  
  

 Thank you
 On Mon, Oct 29, 2012 at 11:51 PM, Timmy Turner timm.t...@gmail.com
 wrote:
 I'm not sure whether the raid 0 gets you anything other than headaches
 should one of the drives fail. You can already distribute the
 individual Cassandra column families on different drives by just
 setting up symlinks to the individual folders.

 2012/10/30 Ran User ranuse...@gmail.com:
  For a server with 4 drive slots only, I'm thinking:
 
  either:
 
  - OS (1 drive)
  - Commit Log (1 drive)
  - Data (2 drives, software raid 0)
 
  vs
 
  - OS  + Data (3 drives, software raid 0)
  - Commit Log (1 drive)
 
  or something else?
 
  also, if I can spare the wasted storage, would RAID 10 for cassandra data
  improve read performance and have no effect on write performance?
 
  Thank you!
 ** **





idea drive layout - 4 drives + RAID question

2012-10-29 Thread Ran User
For a server with 4 drive slots only, I'm thinking:

either:

- OS (1 drive)
- Commit Log (1 drive)
- Data (2 drives, software raid 0)

vs

- OS  + Data (3 drives, software raid 0)
- Commit Log (1 drive)

or something else?

also, if I can spare the wasted storage, would RAID 10 for cassandra data
improve read performance and have no effect on write performance?

Thank you!


Re: idea drive layout - 4 drives + RAID question

2012-10-29 Thread Ran User
I was hoping to achieve approx. 2x IO (write and read) performance via RAID
0 (by accepting a higher MTBF).

Do believe the performance gains of RAID0 are much lower and/or are not
worth it vs the increased server failure rate?

From my understanding, RAID 10 would achieve the read performance benefits
of RAID 0, but not the write benefits.  I'm also considering RAID 10 to
maximize server IO performance.

Currently, we're working with 1 CF.


Thank you

On Mon, Oct 29, 2012 at 11:51 PM, Timmy Turner timm.t...@gmail.com wrote:

 I'm not sure whether the raid 0 gets you anything other than headaches
 should one of the drives fail. You can already distribute the
 individual Cassandra column families on different drives by just
 setting up symlinks to the individual folders.

 2012/10/30 Ran User ranuse...@gmail.com:
  For a server with 4 drive slots only, I'm thinking:
 
  either:
 
  - OS (1 drive)
  - Commit Log (1 drive)
  - Data (2 drives, software raid 0)
 
  vs
 
  - OS  + Data (3 drives, software raid 0)
  - Commit Log (1 drive)
 
  or something else?
 
  also, if I can spare the wasted storage, would RAID 10 for cassandra data
  improve read performance and have no effect on write performance?
 
  Thank you!



Re: idea drive layout - 4 drives + RAID question

2012-10-29 Thread Ran User
Have you considered running RAID 10 for the data drives to improve MTBF?

On one hand Cassandra is handling redundancy issues, on the other
hand, reducing the frequency of dealing with failed nodes
is attractive if cheap (switching RAID levels to 10).

We have no experience with software RAID (have always used hardware raid
with BBU).  I'm assuming software RAID 1 or 10 (the mirroring part) is
inherently reliable (perhaps minus some edge case).

On Tue, Oct 30, 2012 at 1:07 AM, Tupshin Harper tups...@tupshin.com wrote:

 I would generally recommend 1 drive for OS and commit log and 3 drive raid
 0 for data. The raid does give you good performance benefit, and it can be
 convenient to have the OS on a side drive for configuration ease and better
 MTBF.

 -Tupshin
 On Oct 29, 2012 8:56 PM, Ran User ranuse...@gmail.com wrote:

 I was hoping to achieve approx. 2x IO (write and read) performance via
 RAID 0 (by accepting a higher MTBF).

 Do believe the performance gains of RAID0 are much lower and/or are not
 worth it vs the increased server failure rate?

 From my understanding, RAID 10 would achieve the read performance
 benefits of RAID 0, but not the write benefits.  I'm also considering RAID
 10 to maximize server IO performance.

 Currently, we're working with 1 CF.


 Thank you

 On Mon, Oct 29, 2012 at 11:51 PM, Timmy Turner timm.t...@gmail.comwrote:

 I'm not sure whether the raid 0 gets you anything other than headaches
 should one of the drives fail. You can already distribute the
 individual Cassandra column families on different drives by just
 setting up symlinks to the individual folders.

 2012/10/30 Ran User ranuse...@gmail.com:
  For a server with 4 drive slots only, I'm thinking:
 
  either:
 
  - OS (1 drive)
  - Commit Log (1 drive)
  - Data (2 drives, software raid 0)
 
  vs
 
  - OS  + Data (3 drives, software raid 0)
  - Commit Log (1 drive)
 
  or something else?
 
  also, if I can spare the wasted storage, would RAID 10 for cassandra
 data
  improve read performance and have no effect on write performance?
 
  Thank you!





Re: Astyanax InstantiationException when accessing ColumnList

2012-09-12 Thread Ran User
Yes you are right, the issue is here in Astyanax
*AnnotatedCompositeSerializer.java
:*

private T createContents(ClassT clazz) throws
InstantiationException, IllegalAccessException {
return clazz.newInstance();
}

I'm not sure how to get that reflection call working with my Scala
TestCompositeColumn class.  I've posted in scala-user hoping someone there
will likely know the Java - Scala path.

If anyone here has used Astyanax in Scala w/ composite columns, and can
successfully read from Cassandra, I'd love to see one Scala class and
companion object example :)


On Wed, Sep 12, 2012 at 10:28 PM, aaron morton aa...@thelastpickle.comwrote:

 Was there more to the error message ? Looks likes there should be a caused
 by exception there
 https://github.com/Netflix/astyanax/blob/master/src/main/java/com/netflix/astyanax/serializers/AnnotatedCompositeSerializer.java#L114

 The InstationError is being raised when it tries to create an instance of
 the column type

 https://github.com/Netflix/astyanax/blob/master/src/main/java/com/netflix/astyanax/serializers/AnnotatedCompositeSerializer.java#L166

 I would check everything is in the class path and create an issue on
 https://github.com/Netflix/astyanax/issues if you get stuck.

 Cheers



 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 12/09/2012, at 2:04 PM, Ran User ranuse...@gmail.com wrote:

 Oops, forgot to mention Cassandra version - 1.1.4

 On Tue, Sep 11, 2012 at 5:54 AM, Ran User ranuse...@gmail.com wrote:

 Stuck for hours on this one, thanks in advance!

 -  Scala 2.9.2
 - Astyanax 1.0.6 (also tried 1.0.5)
 - Using CompositeRowKey, CompositeColumnName
 - No problem inserting into Cassandra
 - Can read a row, ColumnList.size() returns correct count however any
 attempt to access ColumnList (i.e. iterate, access iterate ColumnList,
 getColumnByIndex(), getColumnByName(), etc) will throw the following
 exception:

 Exception:

 java.lang.RuntimeException: java.lang.InstantiationException

 relevant stack trace:

 java.lang.RuntimeException: java.lang.InstantiationException:
 shops.integration.db.scalaquery.ReportingDao$MetricsLogFileCompositeColumn
 at
 com.netflix.astyanax.serializers.AnnotatedCompositeSerializer.fromByteBuffer(AnnotatedCompositeSerializer.java:136)
 at
 com.netflix.astyanax.serializers.AbstractSerializer.fromBytes(AbstractSerializer.java:40)
 at
 com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.constructMap(ThriftColumnOrSuperColumnListImpl.java:201)
 at
 com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.getColumn(ThriftColumnOrSuperColumnListImpl.java:189)
 at
 com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.getColumnByName(ThriftColumnOrSuperColumnListImpl.java:103)

 Relevant sample code:

 class TestCompositeColumn(@(Component @field) var logFileId: Long,
 @(Component @field) var dt: String, @(Component @field) var dk: String)
 extends Ordered[TestCompositeColumn] {
 def this() = this(0l, , )
 //equals, hashCode, compare all implemented
 }

 I've also tried this variation on the class:

 class TestCompositeColumn(idIn: Long, key1In: String, key2In: String)
 extends Ordered[TestCompositeColumn] {
 @Component(ordinal = 0) var id: Long = idIn
 @Component(ordinal = 1) var key1: String = key1In
 @Component(ordinal = 2) var key2: String = key2In

 def this() = this(0, null, null)
 //equals, hashCode, compare all implemented
 }
 val TEST_COLUMN_FAMILY = new ColumnFamily[TestRowKey,
 TestCompositeColumn](
 test_column_family,
 new AnnotatedCompositeSerializer[TestRowKey](classOf[TestRowKey]),
 new
 AnnotatedCompositeSerializer[TestCompositeColumn](classOf[TestCompositeColumn]),
 BytesArraySerializer.get());

 var columnList = keyspace.prepareQuery(TEST_COLUMN_FAMILY)
 .getKey(TestRowKey(1l, 2012090100))
 .execute().getResult()

 // OK - will return 6 for example, also verified via cassandra-cli
 println(columnList.size())

 // ERROR - will throw exception above.  Iterating, or any type of access
 will also throw same exception
 println(columnList.getColumnByIndex(0).getStringValue())

 Thank you!!!







Re: Astyanax InstantiationException when accessing ColumnList

2012-09-11 Thread Ran User
Oops, forgot to mention Cassandra version - 1.1.4

On Tue, Sep 11, 2012 at 5:54 AM, Ran User ranuse...@gmail.com wrote:

 Stuck for hours on this one, thanks in advance!

 -  Scala 2.9.2
 - Astyanax 1.0.6 (also tried 1.0.5)
 - Using CompositeRowKey, CompositeColumnName
 - No problem inserting into Cassandra
 - Can read a row, ColumnList.size() returns correct count however any
 attempt to access ColumnList (i.e. iterate, access iterate ColumnList,
 getColumnByIndex(), getColumnByName(), etc) will throw the following
 exception:

 Exception:

 java.lang.RuntimeException: java.lang.InstantiationException

 relevant stack trace:

 java.lang.RuntimeException: java.lang.InstantiationException:
 shops.integration.db.scalaquery.ReportingDao$MetricsLogFileCompositeColumn
 at
 com.netflix.astyanax.serializers.AnnotatedCompositeSerializer.fromByteBuffer(AnnotatedCompositeSerializer.java:136)
 at
 com.netflix.astyanax.serializers.AbstractSerializer.fromBytes(AbstractSerializer.java:40)
 at
 com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.constructMap(ThriftColumnOrSuperColumnListImpl.java:201)
 at
 com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.getColumn(ThriftColumnOrSuperColumnListImpl.java:189)
 at
 com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.getColumnByName(ThriftColumnOrSuperColumnListImpl.java:103)

 Relevant sample code:

 class TestCompositeColumn(@(Component @field) var logFileId: Long,
 @(Component @field) var dt: String, @(Component @field) var dk: String)
 extends Ordered[TestCompositeColumn] {
 def this() = this(0l, , )
 //equals, hashCode, compare all implemented
 }

 I've also tried this variation on the class:

 class TestCompositeColumn(idIn: Long, key1In: String, key2In: String)
 extends Ordered[TestCompositeColumn] {
 @Component(ordinal = 0) var id: Long = idIn
 @Component(ordinal = 1) var key1: String = key1In
 @Component(ordinal = 2) var key2: String = key2In

 def this() = this(0, null, null)
 //equals, hashCode, compare all implemented
 }
 val TEST_COLUMN_FAMILY = new ColumnFamily[TestRowKey, TestCompositeColumn](
 test_column_family,
 new AnnotatedCompositeSerializer[TestRowKey](classOf[TestRowKey]),
 new
 AnnotatedCompositeSerializer[TestCompositeColumn](classOf[TestCompositeColumn]),
 BytesArraySerializer.get());

 var columnList = keyspace.prepareQuery(TEST_COLUMN_FAMILY)
 .getKey(TestRowKey(1l, 2012090100))
 .execute().getResult()

 // OK - will return 6 for example, also verified via cassandra-cli
 println(columnList.size())

 // ERROR - will throw exception above.  Iterating, or any type of access
 will also throw same exception
 println(columnList.getColumnByIndex(0).getStringValue())

 Thank you!!!