Thanks for taking this on Ted!
On Sat, Feb 28, 2015 at 4:17 PM, Ted Yu yuzhih...@gmail.com wrote:
I have created SPARK-6085 with pull request:
https://github.com/apache/spark/pull/4836
Cheers
On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet cjno...@gmail.com wrote:
+1 to a better default
+1 to a better default as well.
We were working find until we ran against a real dataset which was much
larger than the test dataset we were using locally. It took me a couple
days and digging through many logs to figure out this value was what was
causing the problem.
On Sat, Feb 28, 2015 at
if there was an
automatic partition reconfiguration function that automagically did that...
On Tue, Feb 24, 2015 at 3:20 AM, Corey Nolet cjno...@gmail.com wrote:
I *think* this may have been related to the default memory overhead
setting being too low. I raised the value to 1G it and tried my job again
be listening to a
partition.
Yes, my understanding is that multiple receivers in one group are the
way to consume a topic's partitions in parallel.
On Sat, Feb 28, 2015 at 12:56 AM, Corey Nolet cjno...@gmail.com wrote:
Looking @ [1], it seems to recommend pull from multiple Kafka topics in
order
Looking @ [1], it seems to recommend pull from multiple Kafka topics in
order to parallelize data received from Kafka over multiple nodes. I notice
in [2], however, that one of the createConsumer() functions takes a
groupId. So am I understanding correctly that creating multiple DStreams
with the
Zhang zzh...@hortonworks.com
wrote:
Currently in spark, it looks like there is no easy way to know the
dependencies. It is solved at run time.
Thanks.
Zhan Zhang
On Feb 26, 2015, at 4:20 PM, Corey Nolet cjno...@gmail.com wrote:
Ted. That one I know. It was the dependency part I
Let's say I'm given 2 RDDs and told to store them in a sequence file and
they have the following dependency:
val rdd1 = sparkContext.sequenceFile().cache()
val rdd2 = rdd1.map()
How would I tell programmatically without being the one who built rdd1 and
rdd2 whether or not rdd2
I see the rdd.dependencies() function, does that include ALL the
dependencies of an RDD? Is it safe to assume I can say
rdd2.dependencies.contains(rdd1)?
On Thu, Feb 26, 2015 at 4:28 PM, Corey Nolet cjno...@gmail.com wrote:
Let's say I'm given 2 RDDs and told to store them in a sequence file
the execution
if there is no shuffle dependencies in between RDDs.
Thanks.
Zhan Zhang
On Feb 26, 2015, at 1:28 PM, Corey Nolet cjno...@gmail.com wrote:
Let's say I'm given 2 RDDs and told to store them in a sequence file and
they have the following dependency:
val rdd1
be the behavior and myself and all my coworkers
expected.
On Thu, Feb 26, 2015 at 6:26 PM, Corey Nolet cjno...@gmail.com wrote:
I should probably mention that my example case is much over simplified-
Let's say I've got a tree, a fairly complex one where I begin a series of
jobs at the root which
in almost all cases. That much, I
don't know how hard it is to implement. But I speculate that it's
easier to deal with it at that level than as a function of the
dependency graph.
On Thu, Feb 26, 2015 at 10:49 PM, Corey Nolet cjno...@gmail.com wrote:
I'm trying to do the scheduling myself now
future { rdd1.saveAsHasoopFile(...) }
future { rdd2.saveAsHadoopFile(…)]
In this way, rdd1 will be calculated once, and two saveAsHadoopFile will
happen concurrently.
Thanks.
Zhan Zhang
On Feb 26, 2015, at 3:28 PM, Corey Nolet cjno...@gmail.com wrote:
What confused me
:
* Return information about what RDDs are cached, if they are in mem or
on disk, how much space
* they take, etc.
*/
@DeveloperApi
def getRDDStorageInfo: Array[RDDInfo] = {
Cheers
On Thu, Feb 26, 2015 at 4:00 PM, Corey Nolet cjno...@gmail.com wrote:
Zhan,
This is exactly what I'm
This vote was supposed to close on Saturday but it looks like no PMCs voted
(other than the implicit vote from Patrick). Was there a discussion offline
to cut an RC2? Was the vote extended?
On Mon, Feb 23, 2015 at 6:59 AM, Robin East robin.e...@xense.co.uk wrote:
Running ec2 launch scripts
SPARK-5183 SPARK-5180 Document data source API
SPARK-3650 Triangle Count handles reverse edges incorrectly
SPARK-3511 Create a RELEASE-NOTES.txt file in the repo
On Mon, Feb 23, 2015 at 1:55 PM, Corey Nolet cjno...@gmail.com wrote:
This vote was supposed to close on Saturday but it looks
?
spark.shuffle.service.enable = true
On 21.2.2015. 17:50, Corey Nolet wrote:
I'm experiencing the same issue. Upon closer inspection I'm noticing
that executors are being lost as well. Thing is, I can't figure out how
they are dying. I'm using MEMORY_AND_DISK_SER and i've got over 1.3TB
:
Could you try to turn on the external shuffle service?
spark.shuffle.service.enable = true
On 21.2.2015. 17:50, Corey Nolet wrote:
I'm experiencing the same issue. Upon closer inspection I'm noticing
that executors are being lost as well. Thing is, I can't figure out how
they are dying. I'm
-
but i have a suspicion this may have been the cause of the executors being
killed by the application master.
On Feb 23, 2015 5:25 PM, Corey Nolet cjno...@gmail.com wrote:
I've got the opposite problem with regards to partitioning. I've got over
6000 partitions for some of these RDDs which
I'm experiencing the same issue. Upon closer inspection I'm noticing that
executors are being lost as well. Thing is, I can't figure out how they are
dying. I'm using MEMORY_AND_DISK_SER and i've got over 1.3TB of memory
allocated for the application. I was thinking perhaps it was possible that
a
The Apache Accumulo project is happy to announce its 1.6.2 release.
Version 1.6.2 is the most recent bug-fix release in its 1.6.x release line.
This version includes numerous bug fixes as well as a performance
improvement over previous versions. Existing users of 1.6.x are encouraged
to upgrade
+1 (non-binding)
- Verified signatures using [1]
- Built on MacOSX Yosemite
- Built on Fedora 21
Each build was run with and Hadoop-2.4 version with yarn, hive, and
hive-thriftserver profiles
I am having trouble getting all the tests passing on a single run on both
machines but we have this
Thanks Keith!. Josh deserves credit for the release notes.
We'll publish the site and I'll get the announcement together.
On Wed, Feb 18, 2015 at 11:34 AM, Josh Elser josh.el...@gmail.com wrote:
+1 ditto. Mirrors appear updated as well. I just fixed another
s/1.6.1/1.6.2/ on the sidebar. I
The Apache Accumulo project is happy to announce its 1.6.2 release.
Version 1.6.2 is the most recent bug-fix release in its 1.6.x release line.
This version includes numerous bug fixes as well as a performance
improvement over previous versions. Existing users of 1.6.x are encouraged
to upgrade
Forwarding to dev.
-- Forwarded message --
From: Corey Nolet cjno...@apache.org
Date: Wed, Feb 18, 2015 at 12:25 PM
Subject: [ANNOUNCE] Apache Accumulo 1.6.2 Released
To: u...@accumulo.apache.org, annou...@apache.org
The Apache Accumulo project is happy to announce its 1.6.2
Niranda,
I'm not sure if I'd say Spark's use of Jetty to expose its UI monitoring
layer constitutes a use of two web servers in a single product. Hadoop
uses Jetty as well as do many other applications today that need embedded
http layers for serving up their monitoring UI to users. This is
We've been using commons configuration to pull our properties out of
properties files and system properties (prioritizing system properties over
others) and we add those properties to our spark conf explicitly and we use
ArgoPartser to get the command line argument for which property file to
load.
...@gmail.com wrote:
Great work, Corey!
What else do we need to do? Release notes? Do you have the
javadoc/artifact deployments under control?
Corey Nolet wrote:
The vote is now closed. The release of Apache Accumulo 1.6.2 RC5 has been
accepted with 3 +1's and 0 -1's.
On Fri, Feb 13
Billie took on the user manual last time. I'm still not sure how to build
the website output for that.
On Sun, Feb 15, 2015 at 8:58 AM, Corey Nolet cjno...@gmail.com wrote:
Josh- I'm terribly busy this weekend but I am going to tackle the release
notes, publishing the artifacts to the website
.Because of ACCUMULO-3597, I was not
able to get a long randomwalk run. The bug happened shortly after
starting the test. I killed the deadlocked tserver and everything started
running again.
On Wed, Feb 11, 2015 at 8:52 AM, Corey Nolet cjno...@apache.org wrote:
Devs,
Please
NOTICE in native.tar.gz
Corey Nolet wrote:
Devs,
Please consider the following candidate for Apache Accumulo 1.6.2
Branch: 1.6.2-rc5
SHA1: 42943a1817434f1f32e9f0224941aa2fff162e74
Staging Repository:
https://repository.apache.org/content/repositories
I don't remember Oracle ever enforcing that I couldn't include a $ in a
column name, but I also don't thinking I've ever tried.
When using sqlContext.sql(...), I have a SELECT * from myTable WHERE
locations_$homeAddress = '123 Elm St'
It's telling me $ is invalid. Is this a bug?
time at which the RC5 was announced, which was 2pm UTC on Wednesday,
February 11th.
That would make the vote close on Saturday, February 14th at 2pm UTC (9am
EST, 6am PT)
On Fri, Feb 13, 2015 at 1:38 PM, Corey Nolet cjno...@gmail.com wrote:
Thanks Josh for your verification. Just a reminder
This doesn't seem to have helped.
On Fri, Feb 13, 2015 at 2:51 PM, Michael Armbrust mich...@databricks.com
wrote:
Try using `backticks` to escape non-standard characters.
On Fri, Feb 13, 2015 at 11:30 AM, Corey Nolet cjno...@gmail.com wrote:
I don't remember Oracle ever enforcing that I
Nevermind- I think I may have had a schema-related issue (sometimes
booleans were represented as string and sometimes as raw booleans but when
I populated the schema one or the other was chosen.
On Fri, Feb 13, 2015 at 8:03 PM, Corey Nolet cjno...@gmail.com wrote:
Here are the results
Here are the results of a few different SQL strings (let's assume the
schemas are valid for the data types used):
SELECT * from myTable where key1 = true - no filters are pushed to my
PrunedFilteredScan
SELECT * from myTable where key1 = true and key2 = 5 - 1 filter (key2) is
pushed to my
I was able to get this working by extending KryoRegistrator and setting the
spark.kryo.registrator property.
On Thu, Feb 12, 2015 at 12:31 PM, Corey Nolet cjno...@gmail.com wrote:
I'm trying to register a custom class that extends Kryo's Serializer
interface. I can't tell exactly what Class
group should need to fit.
On Wed, Feb 11, 2015 at 2:56 PM, Corey Nolet cjno...@gmail.com wrote:
Doesn't iter still need to fit entirely into memory?
On Wed, Feb 11, 2015 at 5:55 PM, Mark Hamstra m...@clearstorydata.com
wrote:
rdd.mapPartitions { iter =
val grouped = iter.grouped(batchSize
the
data to a single partition (no matter what window I set) and it seems to
lock up my jobs. I waited for 15 minutes for a stage that usually takes
about 15 seconds and I finally just killed the job in yarn.
On Thu, Feb 12, 2015 at 4:40 PM, Corey Nolet cjno...@gmail.com wrote:
So I tried
I have a temporal data set in which I'd like to be able to query using
Spark SQL. The dataset is actually in Accumulo and I've already written a
CatalystScan implementation and RelationProvider[1] to register with the
SQLContext so that I can apply my SQL statements.
With my current
I'm trying to register a custom class that extends Kryo's Serializer
interface. I can't tell exactly what Class the registerKryoClasses()
function on the SparkConf is looking for.
How do I register the Serializer class?
Devs,
Please consider the following candidate for Apache Accumulo 1.6.2
Branch: 1.6.2-rc5
SHA1: 42943a1817434f1f32e9f0224941aa2fff162e74
Staging Repository:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1024/
Source tarball:
I think the word partition here is a tad different than the term
partition that we use in Spark. Basically, I want something similar to
Guava's Iterables.partition [1], that is, If I have an RDD[People] and I
want to run an algorithm that can be optimized by working on 30 people at a
time, I'd
Doesn't iter still need to fit entirely into memory?
On Wed, Feb 11, 2015 at 5:55 PM, Mark Hamstra m...@clearstorydata.com
wrote:
rdd.mapPartitions { iter =
val grouped = iter.grouped(batchSize)
for (group - grouped) { ... }
}
On Wed, Feb 11, 2015 at 2:44 PM, Corey Nolet cjno
I am able to get around the problem by doing a map and getting the Event
out of the EventWritable before I do my collect. I think I'll do that for
now.
On Tue, Feb 10, 2015 at 6:04 PM, Corey Nolet cjno...@gmail.com wrote:
I am using an input format to load data from Accumulo [1] in to a Spark
billion entries.
https://issues.apache.org/jira/browse/ACCUMULO-3576
On Thu, Feb 5, 2015 at 11:00 PM, Corey Nolet cjno...@apache.org wrote:
Devs,
Please consider the following candidate for Apache Accumulo 1.6.2
Branch: 1.6.2-rc4
SHA1
.
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Thu, Feb 5, 2015 at 11:00 PM, Corey Nolet cjno...@apache.org wrote:
Devs,
Please consider the following candidate for Apache Accumulo 1.6.2
Branch: 1.6.2-rc4
SHA1: 0649982c2e395852ce2e4408d283a40d6490a980
Devs,
Please consider the following candidate for Apache Accumulo 1.6.2
Branch: 1.6.2-rc4
SHA1: 0649982c2e395852ce2e4408d283a40d6490a980
Staging Repository:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1022/
Source tarball:
Here's another lightweight example of running a SparkContext in a common
java servlet container: https://github.com/calrissian/spark-jetty-server
On Thu, Feb 5, 2015 at 11:46 AM, Charles Feduke charles.fed...@gmail.com
wrote:
If you want to design something like Spark shell have a look at:
in
instance.volumes
Thanks,
Corey Nolet
My mistake Marcello, I was looking at the wrong message. That reply was
meant for bo yang.
On Feb 4, 2015 4:02 PM, Marcelo Vanzin van...@cloudera.com wrote:
Hi Corey,
On Wed, Feb 4, 2015 at 12:44 PM, Corey Nolet cjno...@gmail.com wrote:
Another suggestion is to build Spark by yourself
[
https://issues.apache.org/jira/browse/ACCUMULO-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305913#comment-14305913
]
Corey Nolet commented on ACCUMULO-3549:
---
So we're comfortable with this change
works for YARN).
Also thread at
http://apache-spark-user-list.1001560.n3.nabble.com/netty-on-classpath-when-using-spark-submit-td18030.html
.
HTH,
Markus
On 02/03/2015 11:20 PM, Corey Nolet wrote:
I'm having a really bad dependency conflict right now with Guava versions
between my Spark
, 2015 at 10:36 AM, Keith Turner ke...@deenlo.com wrote:
On Thu, Jan 29, 2015 at 7:27 PM, Corey Nolet cjno...@gmail.com wrote:
However I am seeing ACCUMULO-3545[1] that
I need to investigate.
Ok. I'll cut another RC as soon as that's complete.
Verification completed
I'm having a really bad dependency conflict right now with Guava versions
between my Spark application in Yarn and (I believe) Hadoop's version.
The problem is, my driver has the version of Guava which my application is
expecting (15.0) while it appears the Spark executors that are working on
my
Congrats guys!
On Tue, Feb 3, 2015 at 7:01 PM, Evan Chan velvia.git...@gmail.com wrote:
Congrats everyone!!!
On Tue, Feb 3, 2015 at 3:17 PM, Timothy Chen tnac...@gmail.com wrote:
Congrats all!
Tim
On Feb 4, 2015, at 7:10 AM, Pritish Nawlakhe
prit...@nirvana-international.com
We have a series of spark jobs which run in succession over various cached
datasets, do small groups and transforms, and then call
saveAsSequenceFile() on them.
Each call to save as a sequence file appears to have done its work, the
task says it completed in xxx.x seconds but then it pauses
had one IT that failed on me from the source
build which we can fix later -- things are looking good otherwise from my
testing.
Thanks for working through this Corey, and Keith for finding bugs :)
Corey Nolet wrote:
Devs,
Please consider the following candidate for Apache Accumulo
I'm looking @ the ShuffledRDD code and it looks like there is a method
setKeyOrdering()- is this guaranteed to order everything in the partition?
I'm on Spark 1.2.0
On Wed, Jan 28, 2015 at 9:07 AM, Corey Nolet cjno...@gmail.com wrote:
In all of the soutions I've found thus far, sorting has been
at 2:38 AM, Corey Nolet cjno...@apache.org
https://mail.google.com/mail/?view=cmfs=1tf=1to=cjno...@apache.org
wrote:
Devs,
Please consider the following candidate for Apache Accumulo 1.6.2
Branch: 1.6.2-rc3
SHA1
I'll start on an RC4 but leave this open for awhile in case any more issues
like pop up like this.
On Jan 28, 2015 5:24 PM, Keith Turner ke...@deenlo.com wrote:
-1 because of ACCUMULO-3541
On Wed, Jan 28, 2015 at 2:38 AM, Corey Nolet cjno...@apache.org wrote:
Devs,
Please consider
/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
On Wed, Jan 28, 2015 at 9:16 AM, Corey Nolet cjno...@gmail.com wrote:
I'm looking @ the ShuffledRDD code and it looks like there is a method
setKeyOrdering()- is this guaranteed to order everything in the partition?
I'm on Spark 1.2.0
On Wed
I've read that this is supposed to be a rather significant optimization to
the shuffle system in 1.1.0 but I'm not seeing much documentation on
enabling this in Yarn. I see github classes for it in 1.2.0 and a property
spark.shuffle.service.enabled in the spark-defaults.conf.
The code mentions
, Corey Nolet cjno...@gmail.com wrote:
I need to be able to take an input RDD[Map[String,Any]] and split it into
several different RDDs based on some partitionable piece of the key
(groups) and then send each partition to a separate set of files in
different folders in HDFS.
1) Would running
Devs,
Please consider the following candidate for Apache Accumulo 1.6.2
Branch: 1.6.2-rc3
SHA1: 3a6987470c1e5090a2ca159614a80f0fa50393bf
Staging Repository:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1021/
Source tarball:
I need to be able to take an input RDD[Map[String,Any]] and split it into
several different RDDs based on some partitionable piece of the key
(groups) and then send each partition to a separate set of files in
different folders in HDFS.
1) Would running the RDD through a custom partitioner be the
.
Thanks,
Corey Nolet
---
Basic build with unit tests.
Thanks,
Corey Nolet
/util/HadoopCompatUtil.java
PRE-CREATION
examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/TeraSortIngest.java
1b8cbaf
Diff: https://reviews.apache.org/r/30280/diff/
Testing
---
Basic build with unit tests.
Thanks,
Corey Nolet
.
Thanks,
Corey Nolet
---
Basic build with unit tests.
Thanks,
Corey Nolet
I believe Josh just committed a fix for the missing license header.
On Mon, Jan 26, 2015 at 1:24 PM, Mike Drob md...@mdrob.com wrote:
---
This is an automatically generated e-mail. To reply, visit:
/30252/#comment114283
Good. I'll add this to the release documentation I've been working on.
- Corey Nolet
On Jan. 25, 2015, 9:38 a.m., Sean Busbey wrote:
---
This is an automatically generated e-mail. To reply, visit:
https
(under a semver patch increment, this should be just as
strong an assertion as the reverse)
http://people.apache.org/~busbey/compat_reports/accumulo/1.6.2_to_1.6.1/compat_report.html
On Fri, Jan 23, 2015 at 8:02 PM, Corey Nolet cjno...@apache.org wrote:
Devs,
Please consider the following
Forwarding discussions to dev.
On Jan 25, 2015 3:22 PM, Josh Elser josh.el...@gmail.com wrote:
plus, I don't think it's valid to call this vote on the user list :)
Corey Nolet wrote:
-1 for backwards compatibility issues described.
-1
Corey, I'm really sorry for the churn. I thought I
Elser josh.el...@gmail.com wrote:
I think we used to have instruction lying around that described how to
use
https://github.com/lvc/japi-compliance-checker (not like that has any
influence on what Sean used, though :D)
Corey Nolet wrote:
Sean- is this what you were using [1]?
[1
, 2015 at 7:50 PM, Corey Nolet cjno...@gmail.com wrote:
I did notice something strange reviewing this RC. It appears the
staging
repo doesn't have hash files for the detached GPG signatures
(*.asc.md5,
*.asc.sha1). That's new. Did you do something special regarding this,
Corey
I did notice something strange reviewing this RC. It appears the staging
repo doesn't have hash files for the detached GPG signatures (*.asc.md5,
*.asc.sha1). That's new. Did you do something special regarding this,
Corey? Or maybe this is just a change with mvn, or maybe it's a change
with
20, 2015 at 11:18 PM, Corey Nolet cjno...@apache.org
wrote:
Devs,
Please consider the following candidate for Apache Accumulo 1.6.2
Branch: 1.6.2-rc1
SHA1: 533d93adb17e8b27c5243c97209796f66c6b8b2d
Staging Repository:
https://repository.apache.org
Devs,
Please consider the following candidate for Apache Accumulo 1.6.2
Branch: 1.6.2-rc1
SHA1: 533d93adb17e8b27c5243c97209796f66c6b8b2d
Staging Repository:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1018/
Source tarball:
., Corey Nolet wrote:
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29959/
---
(Updated Jan. 16, 2015, 5:06 a.m.)
Review request
and a replaced volume appears in instance.volumes.
Also verified that the error does not appear when 'bin/accumulo init
--add-volumes' is called and the replaced volume does not appear in
instance.volumes
Thanks,
Corey Nolet
, Jan 17, 2015 at 4:29 PM, Michael Armbrust mich...@databricks.com
wrote:
How are you running your test here? Are you perhaps doing a .count()?
On Sat, Jan 17, 2015 at 12:54 PM, Corey Nolet cjno...@gmail.com wrote:
Michael,
What I'm seeing (in Spark 1.2.0) is that the required columns being
Michael,
What I'm seeing (in Spark 1.2.0) is that the required columns being pushed
down to the DataRelation are not the product of the SELECT clause but
rather just the columns explicitly included in the WHERE clause.
Examples from my testing:
SELECT * FROM myTable -- The required columns are
an example [1] of what I'm trying to accomplish.
[1]
https://github.com/calrissian/accumulo-recipes/blob/273/thirdparty/spark/src/main/scala/org/calrissian/accumulorecipes/spark/sql/EventStore.scala#L49
On Fri, Jan 16, 2015 at 10:17 PM, Corey Nolet cjno...@gmail.com wrote:
Hao,
Thanks so much
There's also an example of running a SparkContext in a java servlet
container from Calrissian: https://github.com/calrissian/spark-jetty-server
On Fri, Jan 16, 2015 at 2:31 PM, olegshirokikh o...@solver.com wrote:
The question is about the ways to create a Windows desktop-based and/or
Down:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
Examples also can be found in the unit test:
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/sources
*From:* Corey Nolet
Reynold,
One thing I'd like worked into the public portion of the API is the json
inferencing logic that creates a Set[(String, StructType)] out of
Map[String,Any]. SPARK-5260 addresses this so that I can use Accumulators
to infer my schema instead of forcing a map/reduce phase to occur on an RDD
in instance.volumes.
Also verified that the error does not appear when 'bin/accumulo init
--add-volumes' is called and the replaced volume does not appear in
instance.volumes
Thanks,
Corey Nolet
/Initialize.java
https://reviews.apache.org/r/29959/#comment112605
Just noticed this. We should certainly have the conversation to standardize
on this. I don't mind doing what everyone's been doing, I just need to know
what that is.
- Corey Nolet
On Jan. 16, 2015, 4:37 a.m., Corey Nolet wrote
., Corey Nolet wrote:
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29959/
---
(Updated Jan. 16, 2015, 4:37 a.m.)
Review request
and a replaced volume appears in instance.volumes.
Also verified that the error does not appear when 'bin/accumulo init
--add-volumes' is called and the replaced volume does not appear in
instance.volumes
Thanks,
Corey Nolet
and a replaced volume appears in instance.volumes.
Also verified that the error does not appear when 'bin/accumulo init
--add-volumes' is called and the replaced volume does not appear in
instance.volumes
Thanks,
Corey Nolet
I have document storage services in Accumulo that I'd like to expose to
Spark SQL. I am able to push down predicate logic to Accumulo to have it
perform only the seeks necessary on each tablet server to grab the results
being asked for.
I'm interested in using Spark SQL to push those predicates
I'm working with RDD[Map[String,Any]] objects all over my codebase. These
objects were all originally parsed from JSON. The processing I do on RDDs
consists of parsing json - grouping/transforming dataset into a feasible
report - outputting data to a file.
I've been wanting to infer the schemas
Just noticed an error in my wording.
Should be I'm assuming it's not immediately aggregating on the driver
each time I call the += on the Accumulator.
On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet cjno...@gmail.com wrote:
What are the limitations of using Accumulators to get a union of a bunch
What are the limitations of using Accumulators to get a union of a bunch of
small sets?
Let's say I have an RDD[Map{String,Any} and i want to do:
rdd.map(accumulator += Set(_.get(entityType).get))
What implication does this have on performance? I'm assuming it's not
immediately aggregating
Cui Lin,
The solution largely depends on how you want your services deployed (Java
web container, Spray framework, etc...) and if you are using a cluster
manager like Yarn or Mesos vs. just firing up your own executors and master.
I recently worked on an example for deploying Spark services
I'm seeing this exception when creating a new SparkContext in YARN:
[ERROR] AssociationError [akka.tcp://sparkdri...@coreys-mbp.home:58243] -
[akka.tcp://driverpropsfetc...@coreys-mbp.home:58453]: Error [Shut down
address: akka.tcp://driverpropsfetc...@coreys-mbp.home:58453] [
-CREATION
Diff: https://reviews.apache.org/r/29502/diff/
Testing
---
Wrote an integration test to verify that ScanDataSource is actually setting the
authorizations on the IteratorEnvironment
Thanks,
Corey Nolet
?
Christopher Tubbs wrote:
Probably best to just format and organize imports for all the changed
files. I noticed a lot of other formatting issues, too.
Corey Nolet wrote:
Not sure why intelli-j defaults to this behavior but it's fixed.
Christopher Tubbs wrote:
Import order
101 - 200 of 409 matches
Mail list logo