Re: debug jsonRDD problem?

2015-05-28 Thread Michael Stone

On Wed, May 27, 2015 at 02:06:16PM -0700, Ted Yu wrote:

Looks like the exception was caused by resolved.get(prefix ++ a) returning None
:
        a => StructField(a.head, resolved.get(prefix ++ a).get, nullable =
true)

There are three occurrences of resolved.get() in createSchema() - None should
be better handled in these places.

My two cents.


Here's the simplest test case I've come up with:

sqlContext.jsonRDD(sc.parallelize(Array("{\"'```'\":\"\"}"))).count()

Mike Stone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: debug jsonRDD problem?

2015-05-27 Thread Michael Stone

On Wed, May 27, 2015 at 01:13:43PM -0700, Ted Yu wrote:

Can you tell us a bit more about (schema of) your JSON ?


It's fairly simple, consisting of 22 fields with values that are mostly 
strings or integers, except that some of the fields are objects
with http header/value pairs. I'd guess it's something in those latter 
fields that is causing the problems. The data is 800M rows that I didn't 
create in the first place and I'm in the process of making a simpler 
test case. What I was mostly wondering is if there were an obvious 
mechanism that I'm just missing to get jsonRDD to spit out more 
information about which specific rows it's having problems with.



You can find sample JSON in sql/core/src/test//scala/org/apache/spark/sql/json/
TestJsonData.scala


I know the jsonRDD works in general, I've used it before without 
problems. It even works on subsets of this data. 


Mike Stone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



debug jsonRDD problem?

2015-05-27 Thread Michael Stone
Can anyone provide some suggestions on how to debug this? Using spark 
1.3.1. The json itself seems to be valid (other programs can parse it) 
and the problem seems to lie in jsonRDD trying to describe & use a 
schema.


scala> sqlContext.jsonRDD(rdd).count()
java.util.NoSuchElementException: None.get
   at scala.None$.get(Option.scala:313)
   at scala.None$.get(Option.scala:311)
   at org.apache.spark.sql.json.JsonRDD$$anonfun$14.apply(JsonRDD.scala:105)
   at org.apache.spark.sql.json.JsonRDD$$anonfun$14.apply(JsonRDD.scala:101)
   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at 
org.apache.spark.sql.json.JsonRDD$.org$apache$spark$sql$json$JsonRDD$$makeStruct$1(JsonRDD.scala:101)
   at org.apache.spark.sql.json.JsonRDD$$anonfun$14.apply(JsonRDD.scala:104)
   at org.apache.spark.sql.json.JsonRDD$$anonfun$14.apply(JsonRDD.scala:101)
   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.Map$Map2.foreach(Map.scala:130)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at 
org.apache.spark.sql.json.JsonRDD$.org$apache$spark$sql$json$JsonRDD$$makeStruct$1(JsonRDD.scala:101)
   at org.apache.spark.sql.json.JsonRDD$.createSchema(JsonRDD.scala:132)
   at org.apache.spark.sql.json.JsonRDD$.inferSchema(JsonRDD.scala:56)
   at org.apache.spark.sql.SQLContext.jsonRDD(SQLContext.scala:635)
   at org.apache.spark.sql.SQLContext.jsonRDD(SQLContext.scala:581)
   [...]


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



dynamicAllocation & spark-shell

2015-04-23 Thread Michael Stone
If I enable dynamicAllocation and then use spark-shell or pyspark, 
things start out working as expected: running simple commands causes new 
executors to start and complete tasks. If the shell is left idle for a 
while, executors start getting killed off:


15/04/23 10:52:43 INFO cluster.YarnClientSchedulerBackend: Requesting to kill 
executor(s) 368
15/04/23 10:52:43 INFO spark.ExecutorAllocationManager: Removing executor 368 
because it has been idle for 600 seconds (new desired total will be 665)

That makes sense. But the action also results in error messages:

15/04/23 10:52:47 ERROR cluster.YarnScheduler: Lost executor 368 on hostname: 
remote Akka client disassociated
15/04/23 10:52:47 INFO scheduler.DAGScheduler: Executor lost: 368 (epoch 0)
15/04/23 10:52:47 INFO spark.ExecutorAllocationManager: Existing executor 368 
has been removed (new total is 665)
15/04/23 10:52:47 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 368 from BlockManagerMaster.
15/04/23 10:52:47 INFO storage.BlockManagerMasterActor: Removing block manager 
BlockManagerId(368, hostname, 35877)
15/04/23 10:52:47 INFO storage.BlockManagerMaster: Removed 368 successfully in 
removeExecutor

After that, trying to run a simple command results in:

15/04/23 10:13:30 ERROR util.Utils: Uncaught exception in thread 
spark-dynamic-executor-allocation-0
java.lang.IllegalArgumentException: Attempted to request a negative number of 
executor(s) -663 from the cluster manager. Please specify a positive number!

And then only the single remaining executor attempts to complete the new 
tasks. Am I missing some kind of simple configuration item, are other 
people seeing the same behavior as a bug, or is this actually expected?


Mike Stone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark.dynamicAllocation.minExecutors

2015-04-16 Thread Michael Stone

On Thu, Apr 16, 2015 at 12:16:13PM -0700, Marcelo Vanzin wrote:

I think Michael is referring to this:

"""
Exception in thread "main" java.lang.IllegalArgumentException: You
must specify at least 1 executor!
Usage: org.apache.spark.deploy.yarn.Client [options]
"""


Yes, sorry, there were too many mins and maxs and I copied the wrong 
line.


Mike Stone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark.dynamicAllocation.minExecutors

2015-04-16 Thread Michael Stone

On Thu, Apr 16, 2015 at 08:10:54PM +0100, Sean Owen wrote:

Yes, look what it was before -- would also reject a minimum of 0.
That's the case you are hitting. 0 is a fine minimum.


How can 0 be a fine minimum if it's rejected? Changing the value is easy 
enough, but in general it's nice for defaults to make sense.


Mike Stone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark.dynamicAllocation.minExecutors

2015-04-16 Thread Michael Stone

On Thu, Apr 16, 2015 at 07:47:51PM +0100, Sean Owen wrote:

IIRC that was fixed already in 1.3

https://github.com/apache/spark/commit/b2047b55c5fc85de6b63276d8ab9610d2496e08b


From that commit:

+ private val minNumExecutors = 
conf.getInt("spark.dynamicAllocation.minExecutors", 0)
...
+ if (maxNumExecutors == 0) {
+ throw new SparkException("spark.dynamicAllocation.maxExecutors cannot be 0!")

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark.dynamicAllocation.minExecutors

2015-04-16 Thread Michael Stone
The default for spark.dynamicAllocation.minExecutors is 0, but that 
value causes a runtime error and a message that the minimum is 1. 
Perhaps the default should be changed to 1?


Mike Stone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-28 Thread Michael Stone
I've also been having trouble running 1.3.0 on HDP. The 
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041
configuration directive seems to work with pyspark, but not propagate 
when using spark-shell. (That is, everything works find with pyspark, 
and spark-shell fails with the "bad substitution" message.)


Mike Stone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org