DSE 4.6 with OpsCenter 5.1.1, agent can't start, port 9042 is occupied by DSE
Hi, getting weird problem when agent to connect to OpsCenter OpsCenter installed on VM with DSE and agent. It's not for production, I have 3 VMs with DSE and OpsCenter for dev/test purposes. The stacktrace from agent log is: vagrant@dsenode03:~$ sudo cat /var/log/datastax-agent/agent.log Starting DataStax agent monitor datastax_agent_monitor INFO [main] 2015-04-05 13:32:31,594 Loading conf files: /var/lib/datastax-agent/conf/address.yaml INFO [main] 2015-04-05 13:32:31,642 Java vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_76 INFO [main] 2015-04-05 13:32:31,642 DataStax Agent version: 5.1.1 INFO [main] 2015-04-05 13:32:31,679 Default config values: {:cassandra_port 9042, :rollups300_ttl 2419200, :settings_cf settings, :restore_req_update_period 60, :my_channel_prefix /agent, :poll_period 60, :thrift_conn_timeout 1, :rollups60_ttl 604800, :stomp_port 61620, :shorttime_interval 10, :longtime_interval 300, :max-seconds-to-sleep 25, :private-conf-props [initial_token listen_address broadcast_address rpc_address], :thrift_port 9160, :async_retry_timeout 5, :agent-conf-group global-cluster-agent-group, :jmx_host 127.0.0.1, :ec2_metadata_api_host 169.254.169.254, :metrics_enabled 1, :async_queue_size 5000, :backup_staging_dir nil, :read-buffer-size 1000, :remote_verify_max 30, :disk_usage_update_period 60, :throttle-bytes-per-second 50, :rollups7200_ttl 31536000, :remote_backup_retries 3, :ssl_keystore nil, :rollup_snapshot_period 300, :is_package true, :monitor_command /usr/share/datastax-agent/bin/datastax_agent_monitor, :thrift_socket_timeout 5000, :remote_verify_initial_delay 1000, :cassandra_log_location /var/log/cassandra/system.log, :remote_backup_region us-west-1, :restore_on_transfer_failure false, :tmp_dir /var/lib/datastax-agent/tmp/, :config_md5 nil, :jmx_port 7199, :write-buffer-size 10, :jmx_metrics_threadpool_size 4, :use_ssl 0, :rollups86400_ttl 0, :nodedetails_threadpool_size 3, :api_port 61621, :kerberos_service nil, :backup_file_queue_max 1, :jmx_thread_pool_size 5, :production 1, :runs_sudo 1, :max_file_transfer_attempts 30, :stomp_interface nil, :storage_keyspace OpsCenter, :hosts [127.0.0.1], :rollup_snapshot_threshold 300, :jmx_retry_timeout 30, :unthrottled-default 100, :remote_backup_retry_delay 5000, :remote_backup_timeout 1000, :seconds-to-read-kill-channel 0.005, :realtime_interval 5, :pdps_ttl 259200} INFO [main] 2015-04-05 13:32:31,924 Waiting for the config from OpsCenter INFO [main] 2015-04-05 13:32:31,925 Attempting to determine Cassandra's broadcast address through JMX INFO [Initialization] 2015-04-05 13:32:31,926 New JMX connection ( 127.0.0.1:7199) INFO [main] 2015-04-05 13:32:31,947 Starting Jetty server: {:join? false, :ssl? false, :host nil, :port 61621} INFO [Jetty] 2015-04-05 13:32:32,026 Jetty server started INFO [Initialization] 2015-04-05 13:32:32,054 Using 192.168.56.30 as the cassandra broadcast address INFO [Initialization] 2015-04-05 13:32:32,135 cassandra RPC address is nil INFO [Initialization] 2015-04-05 13:32:32,135 agent RPC address is 192.168.56.30 INFO [Initialization] 2015-04-05 13:32:32,135 agent RPC broadcast address is 192.168.56.30 ERROR [Initialization] 2015-04-05 13:32:32,342 Can't connect to Cassandra, retrying com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.TransportException: [/127.0.0.1:9042] Cannot connect)) at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:220) at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1231) at com.datastax.driver.core.Cluster.init(Cluster.java:158) at com.datastax.driver.core.Cluster.connect(Cluster.java:246) at clojurewerkz.cassaforte.client$connect_or_close.doInvoke(client.clj:149) at clojure.lang.RestFn.invoke(RestFn.java:410) at clojurewerkz.cassaforte.client$connect.invoke(client.clj:165) at opsagent.cassandra$setup_cassandra$fn__2491.invoke(cassandra.clj:269) at again.core$with_retries_STAR_$fn__2363.invoke(core.clj:98) at again.core$with_retries_STAR_.invoke(core.clj:97) at opsagent.cassandra$setup_cassandra.invoke(cassandra.clj:267) at opsagent.opsagent$setup_cassandra.invoke(opsagent.clj:152) at opsagent.jmx$determine_ip.invoke(jmx.clj:276) at opsagent.jmx$setup_jmx$fn__2867.invoke(jmx.clj:293) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:745) And it tries to repeat in several times. I did: vagrant@dsenode03:~$ sudo netstat -alnpt | grep 9042 tcp0 0 192.168.56.30:*9042* 0.0.0.0:* LISTEN 5490/java and see that some stuff already listening to this port # cut some output vagrant@dsenode03:~$ sudo lsof -p 5490 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java5490 cassandra cwdDIR
Can cqlsh COPY command be run through
Hi, I am looking for, if the CQLSH COPY command be run using the spark scala program. Does it benefit from the parallelism achieved by spark. I am doing something like below: val conf = new SparkConf(true).setMaster(spark://Master-Host:7077) .setAppName(Load Cs Table using COPY TO) lazy val sc = new SparkContext(conf) import com.datastax.spark.connector.cql.CassandraConnector CassandraConnector(conf).withSessionDo { session = session.execute(truncate wfcdb.test_wfctotal;) session.execute(COPY wfcdb.test_wfctotal (wfctotalid, timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, applydtm, laboracctid, paycodeid, startdtm, stimezoneid, adjstartdtm, adjapplydtm, enddtm, homeaccountsw, notpaidsw, wfcjoborgid, unapprovedsw, durationdaysqty, updatedtm, totaledversion, acctapprovalnum) FROM '/home/analytics/Documents/wfctotal.dat' WITH DELIMITER = '|' AND HEADER = true;) Regards, Tarun Tiwari | Workforce Analytics-ETL | Kronos India M: +91 9540 28 27 77 | Tel: +91 120 4015200 Kronos | Time Attendance * Scheduling * Absence Management * HR Payroll * Hiring * Labor Analytics Join Kronos on: kronos.comhttp://www.kronos.com/ | Facebookhttp://www.kronos.com/facebook | Twitterhttp://www.kronos.com/twitter | LinkedInhttp://www.kronos.com/linkedin | YouTubehttp://www.kronos.com/youtube
How much disk is needed to compact Leveled compaction?
Hi, I have a cluster of 5 nodes. We use cassandra 2.1.3. The 5 nodes use about 50-57% of the 1T SSD. One node managed to compact all its data. During one compaction this node used almost 100% of the drive. The other nodes refuse to continue compaction claiming that there is not enough disk space. From the documentation LeveledCompactionStrategy should be able to compact my data, well at least this is what I understand. Size-tiered compaction requires at least as much free disk space for compaction as the size of the largest column family. Leveled compaction needs much less space for compaction, only 10 * sstable_size_in_mb. However, even if you’re using leveled compaction, you should leave much more free disk space available than this to accommodate streaming, repair, and snapshots, which can easily use 10GB or more of disk space. Furthermore, disk performance tends to decline after 80 to 90% of the disk space is used, so don’t push the boundaries. This is the disk usage. Node 4 is the only one that could compact everything. node0: /dev/disk1 931Gi 534Gi 396Gi 57% / node1: /dev/disk1 931Gi 513Gi 417Gi 55% / node2: /dev/disk1 931Gi 526Gi 404Gi 57% / node3: /dev/disk1 931Gi 507Gi 424Gi 54% / node4: /dev/disk1 931Gi 475Gi 456Gi 51% / When I try to compact the other ones I get this: objc[18698]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java and /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be used. Which one is undefined. error: Not enough space for compaction, estimated sstables = 2894, expected write size = 485616651726 -- StackTrace -- java.lang.RuntimeException: Not enough space for compaction, estimated sstables = 2894, expected write size = 485616651726 at org.apache.cassandra.db.compaction.CompactionTask.checkAvailableDiskSpace(CompactionTask.java:293) at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:127) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:76) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:512) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) I did not set the sstable_size_in_mb I use the 160MB default. Is it normal that during compaction it needs so much diskspace? What would be the best solution to overcome this problem? Thanks for your help
Re: Timeseries analysis using Cassandra and partition by date period
Hi, I switched from HBase to Cassandra and try to find problem solution for timeseries analysis on top Cassandra. Depending on what you’re looking for, you might want to check out KairosDB. 0.95 beta2 just shipped yesterday as well so you have good timing. https://github.com/kairosdb/kairosdb On Sat, Apr 4, 2015 at 11:29 AM, Serega Sheypak serega.shey...@gmail.com wrote: Okay, so bucketing by day/week/month is a capacity planning stuff and actual questions I want to ask. As as a conclusion: I have a table events CREATE TABLE user_plans ( id timeuuid, user_id timeuuid, event_ts timestamp, event_type int, some_other_attr text PRIMARY KEY (user_id, ends) ); which fits tactic queries: select smth from user_plans where user_id='xxx' and end_ts now() Then I create second table user_plans_daily (or weekly, monthy) with DDL: CREATE TABLE user_plans_daily/weekly/monthly ( ymd int, user_id timeuuid, event_ts timestamp, event_type int, some_other_attr text ) PRIMARY KEY ((ymd, user_id), event_ts ) WITH CLUSTERING ORDER BY (event_ts DESC); And this table is good for answering strategic questions: select * from user_plans_daily/weekly/monthly where ymd in () And I should avoid long condition inside IN clause, that is why you suggest me to create bigger bucket, correct? 2015-04-04 20:00 GMT+02:00 Jack Krupansky jack.krupan...@gmail.com: It sounds like your time bucket should be a month, but it depends on the amount of data per user per day and your main query range. Within the partition you can then query for a range of days. Yes, all of the rows within a partition are stored on one physical node as well as the replica nodes. -- Jack Krupansky On Sat, Apr 4, 2015 at 1:38 PM, Serega Sheypak serega.shey...@gmail.com wrote: non-equal relation on a partition key is not supported Ok, can I generate select query: select some_attributes from events where ymd = 20150101 or ymd = 20150102 or 20150103 ... or 20150331 The partition key determines which node can satisfy the query So you mean that all rows with the same *(ymd, user_id)* would be on one physical node? 2015-04-04 16:38 GMT+02:00 Jack Krupansky jack.krupan...@gmail.com: Unfortunately, a non-equal relation on a partition key is not supported. You would need to bucket by some larger unit, like a month, and then use the date/time as a clustering column for the row key. Then you could query within the partition. The partition key determines which node can satisfy the query. Designing your partition key judiciously is the key (haha!) to performant Cassandra applications. -- Jack Krupansky On Sat, Apr 4, 2015 at 9:33 AM, Serega Sheypak serega.shey...@gmail.com wrote: Hi, we plan to have 10^8 users and each user could generate 10 events per day. So we have: 10^8 records per day 10^8*30 records per month. Our timewindow analysis could be from 1 to 6 months. Right now PK is PRIMARY KEY (user_id, ends) where endts is exact ts of event. So you suggest this approach: *PRIMARY KEY ((ymd, user_id), event_ts ) * *WITH CLUSTERING ORDER BY (**event_ts* * DESC);* where ymd=20150102 (the Second of January)? *What happens to writes:* SSTable with past days (ymd current_day) stay untouched and don't take part in Compaction process since there are o changes to them? What happens to read: I issue query: select some_attributes from events where ymd = 20150101 and ymd 20150301 Does Cassandra skip SSTables which don't have ymd in specified range and give me a kind of partition elimination, like in traditional DBs? 2015-04-04 14:41 GMT+02:00 Jack Krupansky jack.krupan...@gmail.com: It depends on the actual number of events per user, but simply bucketing the partition key can give you the same effect - clustering rows by time range. A composite partition key could be comprised of the user name and the date. It also depends on the data rate - is it many events per day or just a few events per week, or over what time period. You need to be careful - you don't want your Cassandra partitions to be too big (millions of rows) or too small (just a few or even one row per partition.) -- Jack Krupansky On Sat, Apr 4, 2015 at 7:03 AM, Serega Sheypak serega.shey...@gmail.com wrote: Hi, I switched from HBase to Cassandra and try to find problem solution for timeseries analysis on top Cassandra. I have a entity named Event. Event has attributes: user_id - a guy who triggered event event_ts - when even happened event_type - type of event some_other_attr - some other attrs we don't care about right now. The DDL for entity event looks this way: CREATE TABLE user_plans ( id timeuuid, user_id timeuuid, event_ts timestamp, event_type int, some_other_attr text PRIMARY KEY (user_id, ends) ); Table is infinite, It would grow continuously during application lifetime. I want to ask question: Cassandra, give me all event where
Re: How much disk is needed to compact Leveled compaction?
You appear to have multiple java binaries in your path. That needs to be resolved. sent from my mobile Daemeon C.M. Reiydelle USA 415.501.0198 London +44.0.20.8144.9872 On Apr 5, 2015 1:40 AM, Jean Tremblay jean.tremb...@zen-innovations.com wrote: Hi, I have a cluster of 5 nodes. We use cassandra 2.1.3. The 5 nodes use about 50-57% of the 1T SSD. One node managed to compact all its data. During one compaction this node used almost 100% of the drive. The other nodes refuse to continue compaction claiming that there is not enough disk space. From the documentation LeveledCompactionStrategy should be able to compact my data, well at least this is what I understand. Size-tiered compaction requires at least as much free disk space for compaction as the size of the largest column family. Leveled compaction needs much less space for compaction, only 10 * sstable_size_in_mb. However, even if you’re using leveled compaction, you should leave much more free disk space available than this to accommodate streaming, repair, and snapshots, which can easily use 10GB or more of disk space. Furthermore, disk performance tends to decline after 80 to 90% of the disk space is used, so don’t push the boundaries. This is the disk usage. Node 4 is the only one that could compact everything. node0: /dev/disk1 931Gi 534Gi 396Gi 57% / node1: /dev/disk1 931Gi 513Gi 417Gi 55% / node2: /dev/disk1 931Gi 526Gi 404Gi 57% / node3: /dev/disk1 931Gi 507Gi 424Gi 54% / node4: /dev/disk1 931Gi 475Gi 456Gi 51% / When I try to compact the other ones I get this: objc[18698]: Class JavaLaunchHelper is implemented in both /Library/Java/ JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java and /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be used. Which one is undefined. error: Not enough space for compaction, estimated sstables = 2894, expected write size = 485616651726 -- StackTrace -- java.lang.RuntimeException: Not enough space for compaction, estimated sstables = 2894, expected write size = 485616651726 at org.apache.cassandra.db.compaction.CompactionTask. checkAvailableDiskSpace(CompactionTask.java:293) at org.apache.cassandra.db.compaction.CompactionTask. runMayThrow(CompactionTask.java:127) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal( CompactionTask.java:76) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute( AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow( CompactionManager.java:512) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) I did not set the sstable_size_in_mb I use the 160MB default. Is it normal that during compaction it needs so much diskspace? What would be the best solution to overcome this problem? Thanks for your help