iScala or Scala-notebook
Hey everyone, I know this was asked before but I'm wondering if there have since been any updates. Are there any plans to integrate iScala/Scala-notebook with spark in the near future? This seems like something a lot of people would find very useful, so I was just wondering if anyone has started working on it. Thanks, Eric -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/iScala-or-Scala-notebook-tp10127.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Increase storage.MemoryStore size
Hey everyone, I'm having some trouble increasing the default storage size for a broadcast variable. It looks like it defaults to a little less than 512MB every time, and I can't figure out which configuration to change to increase this. INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 426.5 MB, free 64.2 MB) (I'm seeing this in the terminal on my driver computer) I can change spark.executor.memory, and that seems to increase the amount of RAM available on my nodes, but it doesn't seem to adjust this storage size for my broadcast variables. Any ideas? Thanks, Eric -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Increase-storage-MemoryStore-size-tp7516.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Doubts about MLlib.linalg in python
Hi Congrui Yi, Spark is implemented in Scala, so all Scala features are first available in Scala/Java. PySpark is a python wrapper for the Scala code, so it won't always have the latest features. This is especially true for the Machine learning library. Eric -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Doubts-about-MLlib-linalg-in-python-tp7523p7530.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Calliope Frame size larger than max length
Hey all, I'm working with Calliope to run jobs on a Cassandra cluster in standalone mode. On some larger jobs I run into the following error: java.lang.RuntimeException: Frame size (20667866) larger than max length (15728640)! at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:322) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:289) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.nextKeyValue(CqlPagingRecordReader.java:205) at com.tuplejump.calliope.cql3.Cql3CassandraRDD$$anon$1.hasNext(Cql3CassandraRDD.scala:73) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:724) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:720) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) The max frame size (15728640) is 15mb, which is the default frame size Cassandra uses. Has anyone seen this before? Are there common workarounds? Also, I'd much rather not have to poke around changing Cassandra settings, but I can change spark settings as much as I like. My program itself is extremely simple since I'm testing. I'm just using count() on the RDD I created with casbuilder. Thanks, Eric -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Calliope-Frame-size-larger-than-max-length-tp4469.html Sent from the Apache Spark User List mailing list archive at Nabble.com.