DSE 4.6 with OpsCenter 5.1.1, agent can't start, port 9042 is occupied by DSE

2015-04-05 Thread Serega Sheypak
Hi, getting weird problem when agent to connect to OpsCenter
OpsCenter installed on VM with DSE and agent.
It's not for production, I have 3 VMs with DSE and OpsCenter for dev/test
purposes.

The stacktrace from agent log is:

vagrant@dsenode03:~$ sudo cat /var/log/datastax-agent/agent.log

 Starting DataStax agent monitor datastax_agent_monitor

 INFO [main] 2015-04-05 13:32:31,594 Loading conf files:
/var/lib/datastax-agent/conf/address.yaml

 INFO [main] 2015-04-05 13:32:31,642 Java vendor/version: Java HotSpot(TM)
64-Bit Server VM/1.7.0_76

 INFO [main] 2015-04-05 13:32:31,642 DataStax Agent version: 5.1.1

 INFO [main] 2015-04-05 13:32:31,679 Default config values:
{:cassandra_port 9042, :rollups300_ttl 2419200, :settings_cf settings,
:restore_req_update_period 60, :my_channel_prefix /agent, :poll_period
60, :thrift_conn_timeout 1, :rollups60_ttl 604800, :stomp_port 61620,
:shorttime_interval 10, :longtime_interval 300, :max-seconds-to-sleep 25,
:private-conf-props [initial_token listen_address broadcast_address
rpc_address], :thrift_port 9160, :async_retry_timeout 5,
:agent-conf-group global-cluster-agent-group, :jmx_host 127.0.0.1,
:ec2_metadata_api_host 169.254.169.254, :metrics_enabled 1,
:async_queue_size 5000, :backup_staging_dir nil, :read-buffer-size
1000, :remote_verify_max 30, :disk_usage_update_period 60,
:throttle-bytes-per-second 50, :rollups7200_ttl 31536000,
:remote_backup_retries 3, :ssl_keystore nil, :rollup_snapshot_period 300,
:is_package true, :monitor_command
/usr/share/datastax-agent/bin/datastax_agent_monitor,
:thrift_socket_timeout 5000, :remote_verify_initial_delay 1000,
:cassandra_log_location /var/log/cassandra/system.log,
:remote_backup_region us-west-1, :restore_on_transfer_failure false,
:tmp_dir /var/lib/datastax-agent/tmp/, :config_md5 nil, :jmx_port 7199,
:write-buffer-size 10, :jmx_metrics_threadpool_size 4, :use_ssl 0,
:rollups86400_ttl 0, :nodedetails_threadpool_size 3, :api_port 61621,
:kerberos_service nil, :backup_file_queue_max 1, :jmx_thread_pool_size
5, :production 1, :runs_sudo 1, :max_file_transfer_attempts 30,
:stomp_interface nil, :storage_keyspace OpsCenter, :hosts [127.0.0.1],
:rollup_snapshot_threshold 300, :jmx_retry_timeout 30, :unthrottled-default
100, :remote_backup_retry_delay 5000, :remote_backup_timeout 1000,
:seconds-to-read-kill-channel 0.005, :realtime_interval 5, :pdps_ttl 259200}

 INFO [main] 2015-04-05 13:32:31,924 Waiting for the config from OpsCenter

 INFO [main] 2015-04-05 13:32:31,925 Attempting to determine Cassandra's
broadcast address through JMX

 INFO [Initialization] 2015-04-05 13:32:31,926 New JMX connection (
127.0.0.1:7199)

 INFO [main] 2015-04-05 13:32:31,947 Starting Jetty server: {:join? false,
:ssl? false, :host nil, :port 61621}

 INFO [Jetty] 2015-04-05 13:32:32,026 Jetty server started

 INFO [Initialization] 2015-04-05 13:32:32,054 Using 192.168.56.30 as the
cassandra broadcast address

 INFO [Initialization] 2015-04-05 13:32:32,135 cassandra RPC address is  nil

 INFO [Initialization] 2015-04-05 13:32:32,135 agent RPC address is
192.168.56.30

 INFO [Initialization] 2015-04-05 13:32:32,135 agent RPC broadcast address
is  192.168.56.30

ERROR [Initialization] 2015-04-05 13:32:32,342 Can't connect to Cassandra,
retrying

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: /127.0.0.1:9042
(com.datastax.driver.core.TransportException: [/127.0.0.1:9042] Cannot
connect))

at
com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:220)

at
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78)

at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1231)

at com.datastax.driver.core.Cluster.init(Cluster.java:158)

at com.datastax.driver.core.Cluster.connect(Cluster.java:246)

at clojurewerkz.cassaforte.client$connect_or_close.doInvoke(client.clj:149)

at clojure.lang.RestFn.invoke(RestFn.java:410)

at clojurewerkz.cassaforte.client$connect.invoke(client.clj:165)

at opsagent.cassandra$setup_cassandra$fn__2491.invoke(cassandra.clj:269)

at again.core$with_retries_STAR_$fn__2363.invoke(core.clj:98)

at again.core$with_retries_STAR_.invoke(core.clj:97)

at opsagent.cassandra$setup_cassandra.invoke(cassandra.clj:267)

at opsagent.opsagent$setup_cassandra.invoke(opsagent.clj:152)

at opsagent.jmx$determine_ip.invoke(jmx.clj:276)

at opsagent.jmx$setup_jmx$fn__2867.invoke(jmx.clj:293)

at clojure.lang.AFn.run(AFn.java:24)

 at java.lang.Thread.run(Thread.java:745)


And it tries to repeat in several times.


I did:

vagrant@dsenode03:~$ sudo netstat -alnpt | grep 9042

tcp0  0 192.168.56.30:*9042*  0.0.0.0:*
LISTEN  5490/java


and see that some stuff already listening to this port

# cut some output

vagrant@dsenode03:~$ sudo lsof -p 5490

COMMAND  PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE
NAME

java5490 cassandra  cwdDIR  

Can cqlsh COPY command be run through

2015-04-05 Thread Tiwari, Tarun
Hi,

I am looking for, if the CQLSH COPY command be run using the spark scala 
program. Does it benefit from the parallelism achieved by spark.
I am doing something like below:

val conf = new SparkConf(true).setMaster(spark://Master-Host:7077) 
.setAppName(Load Cs Table using COPY TO)
lazy val sc = new SparkContext(conf)

import com.datastax.spark.connector.cql.CassandraConnector

CassandraConnector(conf).withSessionDo { session =
session.execute(truncate wfcdb.test_wfctotal;)
session.execute(COPY wfcdb.test_wfctotal 
(wfctotalid, timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, 
applydtm, laboracctid, paycodeid, startdtm, stimezoneid, adjstartdtm, 
adjapplydtm, enddtm, homeaccountsw, notpaidsw, wfcjoborgid, unapprovedsw, 
durationdaysqty, updatedtm, totaledversion, acctapprovalnum) FROM 
'/home/analytics/Documents/wfctotal.dat' WITH DELIMITER = '|' AND HEADER = 
true;)

Regards,
Tarun Tiwari | Workforce Analytics-ETL | Kronos India
M: +91 9540 28 27 77 | Tel: +91 120 4015200
Kronos | Time  Attendance * Scheduling * Absence Management * HR  Payroll * 
Hiring * Labor Analytics
Join Kronos on: kronos.comhttp://www.kronos.com/ | 
Facebookhttp://www.kronos.com/facebook | 
Twitterhttp://www.kronos.com/twitter | 
LinkedInhttp://www.kronos.com/linkedin | 
YouTubehttp://www.kronos.com/youtube



How much disk is needed to compact Leveled compaction?

2015-04-05 Thread Jean Tremblay
Hi,
I have a cluster of 5 nodes. We use cassandra 2.1.3.

The 5 nodes use about 50-57% of the 1T SSD.
One node managed to compact all its data. During one compaction this node used 
almost 100% of the drive. The other nodes refuse to continue compaction 
claiming that there is not enough disk space.

From the documentation LeveledCompactionStrategy should be able to compact my 
data, well at least this is what I understand.

Size-tiered compaction requires at least as much free disk space for 
compaction as the size of the largest column family. Leveled compaction needs 
much less space for compaction, only 10 * sstable_size_in_mb. However, even if 
you’re using leveled compaction, you should leave much more free disk space 
available than this to accommodate streaming, repair, and snapshots, which can 
easily use 10GB or more of disk space. Furthermore, disk performance tends to 
decline after 80 to 90% of the disk space is used, so don’t push the 
boundaries.

This is the disk usage. Node 4 is the only one that could compact everything.
node0: /dev/disk1 931Gi 534Gi 396Gi 57% /
node1: /dev/disk1 931Gi 513Gi 417Gi 55% /
node2: /dev/disk1 931Gi 526Gi 404Gi 57% /
node3: /dev/disk1 931Gi 507Gi 424Gi 54% /
node4: /dev/disk1 931Gi 475Gi 456Gi 51% /

When I try to compact the other ones I get this:

objc[18698]: Class JavaLaunchHelper is implemented in both 
/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java and 
/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/libinstrument.dylib.
 One of the two will be used. Which one is undefined.
error: Not enough space for compaction, estimated sstables = 2894, expected 
write size = 485616651726
-- StackTrace --
java.lang.RuntimeException: Not enough space for compaction, estimated sstables 
= 2894, expected write size = 485616651726
at 
org.apache.cassandra.db.compaction.CompactionTask.checkAvailableDiskSpace(CompactionTask.java:293)
at 
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:127)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:76)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:512)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

I did not set the sstable_size_in_mb I use the 160MB default.

Is it normal that during compaction it needs so much diskspace? What would be 
the best solution to overcome this problem?

Thanks for your help



Re: Timeseries analysis using Cassandra and partition by date period

2015-04-05 Thread Kevin Burton
 Hi, I switched from HBase to Cassandra and try to find problem solution
for timeseries analysis on top Cassandra.

Depending on what you’re looking for, you might want to check out KairosDB.

0.95 beta2 just shipped yesterday as well so you have good timing.

https://github.com/kairosdb/kairosdb

On Sat, Apr 4, 2015 at 11:29 AM, Serega Sheypak serega.shey...@gmail.com
wrote:

 Okay, so bucketing by day/week/month is a capacity planning stuff and
 actual questions I want to ask.
 As as a conclusion:
 I have a table events

 CREATE TABLE user_plans (
   id timeuuid,
   user_id timeuuid,
   event_ts timestamp,
   event_type int,
   some_other_attr text

 PRIMARY KEY (user_id, ends)
 );
 which fits tactic queries:
 select smth from user_plans where user_id='xxx' and end_ts  now()

 Then I create second table user_plans_daily (or weekly, monthy)

 with DDL:
 CREATE TABLE user_plans_daily/weekly/monthly (
   ymd int,
   user_id timeuuid,
   event_ts timestamp,
   event_type int,
   some_other_attr text
 )
 PRIMARY KEY ((ymd, user_id), event_ts )
 WITH CLUSTERING ORDER BY (event_ts DESC);

 And this table is good for answering strategic questions:
 select * from
 user_plans_daily/weekly/monthly
 where ymd in ()
 And I should avoid long condition inside IN clause, that is why you
 suggest me to create bigger bucket, correct?


 2015-04-04 20:00 GMT+02:00 Jack Krupansky jack.krupan...@gmail.com:

 It sounds like your time bucket should be a month, but it depends on the
 amount of data per user per day and your main query range. Within the
 partition you can then query for a range of days.

 Yes, all of the rows within a partition are stored on one physical node
 as well as the replica nodes.

 -- Jack Krupansky

 On Sat, Apr 4, 2015 at 1:38 PM, Serega Sheypak serega.shey...@gmail.com
 wrote:

 non-equal relation on a partition key is not supported
 Ok, can I generate select query:
 select some_attributes
 from events where ymd = 20150101 or ymd = 20150102 or 20150103 ... or
 20150331

  The partition key determines which node can satisfy the query
 So you mean that all rows with the same *(ymd, user_id)* would be on
 one physical node?


 2015-04-04 16:38 GMT+02:00 Jack Krupansky jack.krupan...@gmail.com:

 Unfortunately, a non-equal relation on a partition key is not
 supported. You would need to bucket by some larger unit, like a month, and
 then use the date/time as a clustering column for the row key. Then you
 could query within the partition. The partition key determines which node
 can satisfy the query. Designing your partition key judiciously is the key
 (haha!) to performant Cassandra applications.

 -- Jack Krupansky

 On Sat, Apr 4, 2015 at 9:33 AM, Serega Sheypak 
 serega.shey...@gmail.com wrote:

 Hi, we plan to have 10^8 users and each user could generate 10 events
 per day.
 So we have:
 10^8 records per day
 10^8*30 records per month.
 Our timewindow analysis could be from 1 to 6 months.

 Right now PK is PRIMARY KEY (user_id, ends) where endts is exact ts
 of event.

 So you suggest this approach:
 *PRIMARY KEY ((ymd, user_id), event_ts ) *
 *WITH CLUSTERING ORDER BY (**event_ts*
 * DESC);*

 where ymd=20150102 (the Second of January)?

 *What happens to writes:*
 SSTable with past days (ymd  current_day) stay untouched and don't
 take part in Compaction process since there are o changes to them?

 What happens to read:
 I issue query:
 select some_attributes
 from events where ymd = 20150101 and ymd  20150301
 Does Cassandra skip SSTables which don't have ymd in specified range
 and give me a kind of partition elimination, like in traditional DBs?


 2015-04-04 14:41 GMT+02:00 Jack Krupansky jack.krupan...@gmail.com:

 It depends on the actual number of events per user, but simply
 bucketing the partition key can give you the same effect - clustering 
 rows
 by time range. A composite partition key could be comprised of the user
 name and the date.

 It also depends on the data rate - is it many events per day or just
 a few events per week, or over what time period. You need to be careful -
 you don't want your Cassandra partitions to be too big (millions of rows)
 or too small (just a few or even one row per partition.)

 -- Jack Krupansky

 On Sat, Apr 4, 2015 at 7:03 AM, Serega Sheypak 
 serega.shey...@gmail.com wrote:

 Hi, I switched from HBase to Cassandra and try to find problem
 solution for timeseries analysis on top Cassandra.
 I have a entity named Event.
 Event has attributes:
 user_id - a guy who triggered event
 event_ts - when even happened
 event_type - type of event
 some_other_attr - some other attrs we don't care about right now.

 The DDL for entity event looks this way:

 CREATE TABLE user_plans (

   id timeuuid,
   user_id timeuuid,
   event_ts timestamp,
   event_type int,
   some_other_attr text

 PRIMARY KEY (user_id, ends)
 );

 Table is infinite, It would grow continuously during application
 lifetime.
 I want to ask question:
 Cassandra, give me all event where 

Re: How much disk is needed to compact Leveled compaction?

2015-04-05 Thread daemeon reiydelle
You appear to have multiple java binaries in your path. That needs to be
resolved.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Apr 5, 2015 1:40 AM, Jean Tremblay jean.tremb...@zen-innovations.com
wrote:

  Hi,
 I have a cluster of 5 nodes. We use cassandra 2.1.3.

  The 5 nodes use about 50-57% of the 1T SSD.
  One node managed to compact all its data. During one compaction this node
 used almost 100% of the drive. The other nodes refuse to continue
 compaction claiming that there is not enough disk space.

  From the documentation LeveledCompactionStrategy should be able to
 compact my data, well at least this is what I understand.

  Size-tiered compaction requires at least as much free disk space for
 compaction as the size of the largest column family. Leveled compaction
 needs much less space for compaction, only 10 * sstable_size_in_mb.
 However, even if you’re using leveled compaction, you should leave much
 more free disk space available than this to accommodate streaming, repair,
 and snapshots, which can easily use 10GB or more of disk space.
 Furthermore, disk performance tends to decline after 80 to 90% of the disk
 space is used, so don’t push the boundaries.

  This is the disk usage. Node 4 is the only one that could compact
 everything.
  node0: /dev/disk1 931Gi 534Gi 396Gi 57% /
 node1: /dev/disk1 931Gi 513Gi 417Gi 55% /
 node2: /dev/disk1 931Gi 526Gi 404Gi 57% /
 node3: /dev/disk1 931Gi 507Gi 424Gi 54% /
 node4: /dev/disk1 931Gi 475Gi 456Gi 51% /

  When I try to compact the other ones I get this:

  objc[18698]: Class JavaLaunchHelper is implemented in both /Library/Java/
 JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java and
 /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/libinstrument.dylib.
 One of the two will be used. Which one is undefined.
 error: Not enough space for compaction, estimated sstables = 2894,
 expected write size = 485616651726
 -- StackTrace --
 java.lang.RuntimeException: Not enough space for compaction, estimated
 sstables = 2894, expected write size = 485616651726
 at org.apache.cassandra.db.compaction.CompactionTask.
 checkAvailableDiskSpace(CompactionTask.java:293)
 at org.apache.cassandra.db.compaction.CompactionTask.
 runMayThrow(CompactionTask.java:127)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(
 CompactionTask.java:76)
 at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(
 AbstractCompactionTask.java:59)
 at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(
 CompactionManager.java:512)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

   I did not set the sstable_size_in_mb I use the 160MB default.

  Is it normal that during compaction it needs so much diskspace? What
 would be the best solution to overcome this problem?

  Thanks for your help