iScala or Scala-notebook

2014-07-17 Thread ericjohnston1989
Hey everyone,

I know this was asked before but I'm wondering if there have since been any
updates. Are there any plans to integrate iScala/Scala-notebook with spark
in the near future?

This seems like something a lot of people would find very useful, so I was
just wondering if anyone has started working on it.

Thanks,

Eric



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/iScala-or-Scala-notebook-tp10127.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Increase storage.MemoryStore size

2014-06-12 Thread ericjohnston1989
Hey everyone,

I'm having some trouble increasing the default storage size for a broadcast
variable. It looks like it defaults to a little less than 512MB every time,
and I can't figure out which configuration to change to increase this.

INFO storage.MemoryStore: Block broadcast_0 stored as values to memory
(estimated size 426.5 MB, free 64.2 MB)

(I'm seeing this in the terminal on my driver computer)

I can change spark.executor.memory, and that seems to increase the amount
of RAM available on my nodes, but it doesn't seem to adjust this storage
size for my broadcast variables. Any ideas?

Thanks,

Eric



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Increase-storage-MemoryStore-size-tp7516.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Doubts about MLlib.linalg in python

2014-06-12 Thread ericjohnston1989
Hi Congrui Yi,

Spark is implemented in Scala, so all Scala features are first available in
Scala/Java. PySpark is a python wrapper for the Scala code, so it won't
always have the latest features.  This is especially true for the Machine
learning library.

Eric



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Doubts-about-MLlib-linalg-in-python-tp7523p7530.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Calliope Frame size larger than max length

2014-04-18 Thread ericjohnston1989
Hey all,

I'm working with Calliope to run jobs on a Cassandra cluster in standalone
mode. On some larger jobs I run into the following error:

java.lang.RuntimeException: Frame size (20667866) larger than max length
(15728640)!
at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:322)
at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:289)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.nextKeyValue(CqlPagingRecordReader.java:205)
at
com.tuplejump.calliope.cql3.Cql3CassandraRDD$$anon$1.hasNext(Cql3CassandraRDD.scala:73)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:724)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:720)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)


The max frame size (15728640) is 15mb, which is the default frame size
Cassandra uses. Has anyone seen this before? Are there common workarounds?
Also, I'd much rather not have to poke around changing Cassandra settings,
but I can change spark settings as much as I like.

My program itself is extremely simple since I'm testing. I'm just using
count() on the RDD I created with casbuilder.

Thanks,

Eric






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Calliope-Frame-size-larger-than-max-length-tp4469.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.