Re: Broadcast variable size limit?

2018-08-05 Thread klrmowse
i don't need more, per se... i just need to watch the size of the variable; then, if it's within the size limit, go ahead and broadcast it; if not, then i won't broadcast... so, that would be a yes then? (2 GB, or which is it exactly?) -- Sent from:

Broadcast variable size limit?

2018-08-05 Thread klrmowse
is it currently still ~2GB (Integer.MAX_VALUE) ?? or am i misinformed, since that's what google-search and scouring this mailing list seem to say... ? Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: [EXT] [Spark 2.x Core] .collect() size limit

2018-05-01 Thread klrmowse
okie, i may have found an alternate/workaround to using .collect() for what i am trying to achieve... initially, for the Spark application that i am working on, i would call .collect() on two separate RDDs into a couple of ArrayLists (which was the reason i was asking what the size limit on the

[Spark 2.x Core] .collect() size limit

2018-04-28 Thread klrmowse
i am currently trying to find a workaround for the Spark application i am working on so that it does not have to use .collect() but, for now, it is going to have to use .collect() what is the size limit (memory for the driver) of RDD file that .collect() can work with? i've been scouring

--driver-memory allocation question

2018-04-20 Thread klrmowse
newb question... say, memory per node is 16GB for 6 nodes (for a total of 96GB for the cluster) is 16GB the max amount of memory that can be allocated to driver? (since, it is, after all, 16GB per node) Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: [Spark 2.x Core] Job writing out an extra empty part-0000* file

2018-04-20 Thread klrmowse
well... it turns out, that extra part-* file goes away when i limit --num-executors to 1 or 2 (leaving it to default maxes it out, which in turn gives an extra empty part-file) i guess the test data i'm using only requires that many executors -- Sent from:

[Spark 2.x Core] Job writing out an extra empty part-0000* file

2018-04-16 Thread klrmowse
the spark job succeeds (and with correct output), except there is always an extra part-* file, and it is empty... i even set number of partitions to only 2 via spark-submit, but there is still a 3rd, empty, part-file that shows up. why does it do that? how to fix? Thank you -- Sent

Re: [Spark 2.x Core] Adding to ArrayList inside rdd.foreach()

2018-04-07 Thread klrmowse
okie, well... i'm working with a pair rdd i need to extract the values and store them somehow (maybe a simple Array??), which i later parallelize and reuse since adding to a list is a no-no, what, if any, are the other options? (Java Spark, btw) thanks -- Sent

[Spark 2.x Core] Adding to ArrayList inside rdd.foreach()

2018-04-07 Thread klrmowse
it gives null pointer exception... is there a workaround for adding to an arrayList during .foreach of an rdd? thank you -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Spark 2.x Core: .setMaster(local[*]) output is different from spark-submit

2018-03-17 Thread klrmowse
for clarification... .saveAsTextFile(rdd) writes to local fs, but not hdfs anyone? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark 2.x Core: .setMaster(local[*]) output is different from spark-submit

2018-03-16 Thread klrmowse
when i run a job with .setMaster(local[*]), the output is as expected... but when i run it using YARN (single node, pseudo-distributed hdfs) via spark-submit, the output is fudged - instead of key-value pairs, it only shows one value preceded by a comma, and the rest are blank what am i