Strange DAG scheduling behavior on currently dependent RDDs

2015-01-07 Thread Corey Nolet
We just updated to Spark 1.2.0 from Spark 1.1.0. We have a small framework that we've been developing that connects various different RDDs together based on some predefined business cases. After updating to 1.2.0, some of the concurrency expectations about how the stages within jobs are executed

Re: What does (### skipped) mean in the Spark UI?

2015-01-07 Thread Corey Nolet
: Looks like the number of skipped stages couldn't be formatted. Cheers On Wed, Jan 7, 2015 at 12:08 PM, Corey Nolet cjno...@gmail.com wrote: We just upgraded to Spark 1.2.0 and we're seeing this in the UI.

What does (### skipped) mean in the Spark UI?

2015-01-07 Thread Corey Nolet
We just upgraded to Spark 1.2.0 and we're seeing this in the UI.

Re: Strange DAG scheduling behavior on currently dependent RDDs

2015-01-07 Thread Corey Nolet
lineages. What's strange is that this bug only surfaced when I updated Spark. On Wed, Jan 7, 2015 at 9:12 AM, Corey Nolet cjno...@gmail.com wrote: We just updated to Spark 1.2.0 from Spark 1.1.0. We have a small framework that we've been developing that connects various different RDDs together

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2015-01-06 Thread Corey Nolet
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29502/#review66725 --- On Dec. 31, 2014, 3:40 p.m., Corey Nolet wrote

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2015-01-06 Thread Corey Nolet
://reviews.apache.org/r/29502/diff/ Testing --- Wrote an integration test to verify that ScanDataSource is actually setting the authorizations on the IteratorEnvironment Thanks, Corey Nolet

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2015-01-06 Thread Corey Nolet
on the IteratorEnvironment Thanks, Corey Nolet

Re: Write and Read file through map reduce

2015-01-05 Thread Corey Nolet
Hitarth, I don't know how much direction you are looking for with regards to the formats of the times but you can certainly read both files into the third mapreduce job using the FileInputFormat by comma-separating the paths to the files. The blocks for both files will essentially be unioned

Re: Submitting spark jobs through yarn-client

2015-01-03 Thread Corey Nolet
driver application. Here's the example code on github: https://github.com/cjnolet/spark-jetty-server On Fri, Jan 2, 2015 at 11:35 PM, Corey Nolet cjno...@gmail.com wrote: So looking @ the actual code- I see where it looks like --class 'notused' --jar null is set on the ClientBase.scala when yarn

Re: Submitting spark jobs through yarn-client

2015-01-02 Thread Corey Nolet
Looking a little closer @ the launch_container.sh file, it appears to be adding a $PWD/__app__.jar to the classpath but there is no __app__.jar in the directory pointed to by PWD. Any ideas? On Fri, Jan 2, 2015 at 4:20 PM, Corey Nolet cjno...@gmail.com wrote: I'm trying to get a SparkContext

Submitting spark jobs through yarn-client

2015-01-02 Thread Corey Nolet
I'm trying to get a SparkContext going in a web container which is being submitted through yarn-client. I'm trying two different approaches and both seem to be resulting in the same error from the yarn nodemanagers: 1) I'm newing up a spark context direct, manually adding all the lib jars from

Re: Submitting spark jobs through yarn-client

2015-01-02 Thread Corey Nolet
2, 2015 at 5:46 PM, Corey Nolet cjno...@gmail.com wrote: .. and looking even further, it looks like the actual command tha'ts executed starting up the JVM to run the org.apache.spark.deploy.yarn.ExecutorLauncher is passing in --class 'notused' --jar null. I would assume this isn't expected

Re: Submitting spark jobs through yarn-client

2015-01-02 Thread Corey Nolet
they aren't making it through. On Fri, Jan 2, 2015 at 5:02 PM, Corey Nolet cjno...@gmail.com wrote: Looking a little closer @ the launch_container.sh file, it appears to be adding a $PWD/__app__.jar to the classpath but there is no __app__.jar in the directory pointed to by PWD. Any ideas? On Fri, Jan

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2014-12-31 Thread Corey Nolet
on the IteratorEnvironment Thanks, Corey Nolet

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2014-12-31 Thread Corey Nolet
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29502/#review66439 --- On Dec. 31, 2014, 1:46 p.m., Corey Nolet wrote

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2014-12-31 Thread Corey Nolet
that ScanDataSource is actually setting the authorizations on the IteratorEnvironment Thanks, Corey Nolet

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2014-12-30 Thread Corey Nolet
PRE-CREATION Diff: https://reviews.apache.org/r/29502/diff/ Testing --- Wrote an integration test to verify that ScanDataSource is actually setting the authorizations on the IteratorEnvironment Thanks, Corey Nolet

Submit spark jobs inside web application

2014-12-29 Thread Corey Nolet
I want to have a SparkContext inside of a web application running in Jetty that i can use to submit jobs to a cluster of Spark executors. I am running YARN. Ultimately, I would love it if I could just use somethjing like SparkSubmit.main() to allocate a bunch of resoruces in YARN when the webapp

How to tell if RDD no longer has any children

2014-12-29 Thread Corey Nolet
Let's say I have an RDD which gets cached and has two children which do something with it: val rdd1 = ...cache() rdd1.saveAsSequenceFile() rdd1.groupBy()..saveAsSequenceFile() If I were to submit both calls to saveAsSequenceFile() in thread to take advantage of concurrency (where

Cached RDD

2014-12-29 Thread Corey Nolet
If I have 2 RDDs which depend on the same RDD like the following: val rdd1 = ... val rdd2 = rdd1.groupBy()... val rdd3 = rdd1.groupBy()... If I don't cache rdd1, will it's lineage be calculated twice (one for rdd2 and one for rdd3)?

Re: When will spark 1.2 released?

2014-12-19 Thread Corey Nolet
The dates of the jars were still of Dec 10th. I figured that was because the jars were staged in Nexus on that date (before the vote). On Fri, Dec 19, 2014 at 12:16 PM, Ted Yu yuzhih...@gmail.com wrote: Looking at: http://search.maven.org/#browse%7C717101892 The dates of the jars were

Re: JIRA Tickets for 1.6.2 Release

2014-12-18 Thread Corey Nolet
Have you started tracking a CHANGES list yet (do we need to update anything added back in 1.6.2)? I did start a CHANGES file in the 1.6.2-SNAPSHOT branch. I figure after the tickets settle down I'll just create a new one. On Thu, Dec 18, 2014 at 2:05 PM, Christopher ctubb...@apache.org wrote:

Re: 1.6.2 candidates

2014-12-17 Thread Corey Nolet
is preferable. On Tue, Dec 16, 2014 at 7:18 PM, Corey Nolet cjno...@gmail.com wrote: I have cycles to spin the RCs- I wouldn't mind finishing the updates (per my notes) of the release documentation as well. On Tue, Dec 16, 2014 at 7:11 PM, Christopher ctubb...@apache.org wrote

JIRA Tickets for 1.6.2 Release

2014-12-17 Thread Corey Nolet
Since we've been discussing cutting an rc0 for testing before we begin the formal release process. I've moved over all the non-blocker tickets from 1.6.2 to 1.6.3 [1]. Many of the tickets that moved haven't been updated since the 1.6.1 release. If there are tickets you feel are necessary for

build.sh script still being used?

2014-12-17 Thread Corey Nolet
I'm working on updating the Making a Release page on our website [1] with more detailed instructions on the steps involved. Create the candidate section references the build.sh script and I'm contemplating just removing it altogether since it seems like, after quick discussions with a few

Re: 1.6.2 candidates

2014-12-16 Thread Corey Nolet
I have cycles to spin the RCs- I wouldn't mind finishing the updates (per my notes) of the release documentation as well. On Tue, Dec 16, 2014 at 7:11 PM, Christopher ctubb...@apache.org wrote: I think it'd be good to let somebody else exercise the process a bit, but I can make the RCs if

Spark eating exceptions in multi-threaded local mode

2014-12-16 Thread Corey Nolet
I've been running a job in local mode using --master local[*] and I've noticed that, for some reason, exceptions appear to get eaten- as in, I don't see them. If i debug in my IDE, I'll see that an exception was thrown if I step through the code but if I just run the application, it appears

Re: accumulo join order count,sum,avg

2014-12-14 Thread Corey Nolet
A good example of the count/sum/average can be found in our StatsCombiner example [1]. Joins are a complicated one- your implementation of joins will really depend on your data set and the expected sizes of each side of the join. You can obviously always resort to joining data together on

Re: accumulo Scanner

2014-12-11 Thread Corey Nolet
You're going to want to use WholeRowIterator.decodeRow(entry.getKey(), entry.getValue()) for that one. You can do: for(EntryKey,Value entry : scanner) { for(EntryKey,Value actualEntry : WholeRowIterator.decodeRow(entry.getKey(), entry.getValue()).entrySet()) { // do something with

Re: Accumulo Working Day

2014-12-09 Thread Corey Nolet
Also talked a little about Christopher's working on a new API design: https://github.com/ctubbsii/accumulo/blob/ACCUMULO-2589/ On Tue, Dec 9, 2014 at 11:56 PM, Josh Elser josh.el...@gmail.com wrote: Just so you don't think I forgot, there wasn't really much to report today. Lots of friendly

Possible typo in the Hadoop Latest Stable Release Page

2014-12-09 Thread Corey Nolet
I'm looking @ this page: http://hadoop.apache.org/docs/stable/ Is it a typo that Hadoop 2.6.0 is based on 2.4.1? Thanks.

Re: Running two different Spark jobs vs multi-threading RDDs

2014-12-06 Thread Corey Nolet
Reading the documentation a little more closely, I'm using the wrong terminology. I'm using stages to refer to what spark is calling a job. I guess application (more than one spark context) is what I'm asking about On Dec 5, 2014 5:19 PM, Corey Nolet cjno...@gmail.com wrote: I've read

Running two different Spark jobs vs multi-threading RDDs

2014-12-05 Thread Corey Nolet
I've read in the documentation that RDDs can be run concurrently when submitted in separate threads. I'm curious how the scheduler would handle propagating these down to the tasks. I have 3 RDDs: - one RDD which loads some initial data, transforms it and caches it - two RDDs which use the cached

Re: [VOTE] ACCUMULO-3176

2014-12-01 Thread Corey Nolet
+1 in case it wasn't inferred from my previous comments. As Josh stated, I'm still confused how the veto still holds technical justification- the changes being made aren't removing methods from the public API. On Mon, Dec 1, 2014 at 3:42 PM, Josh Elser josh.el...@gmail.com wrote: I still don't

Re: Can MiniAccumuloCluster reuse directory?

2014-11-30 Thread Corey Nolet
I had a ticket for that awhile back and I don't believe it was ever completed. By default, it wants to dump out new config files for everything- have it reusing a config file would mean not re-initializing each time and reusing the same instance id + rfiles. ACCUMULO-1378 was the it and it looks

[jira] [Commented] (ACCUMULO-3371) Allow user to set Zookeeper port in MiniAccumuloCluster

2014-11-29 Thread Corey Nolet (JIRA)
[ https://issues.apache.org/jira/browse/ACCUMULO-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228916#comment-14228916 ] Corey Nolet commented on ACCUMULO-3371: --- David, http://accumulo.apache.org/1.6

Re: [DISCUSS] Bylaws Change - Majority Approval for Code Changes

2014-11-26 Thread Corey Nolet
Jeremy, The PMC boards in ASF are re On Wed, Nov 26, 2014 at 1:18 PM, Jeremy Kepner kep...@ll.mit.edu wrote: To be effective, most boards need to be small (~5 people) and not involved with day-to-day. Ideally, if someone says let's bring this to the board for a decision the collective

Re: Unsubscribe

2014-11-26 Thread Corey Nolet
send an email to user-unsubscr...@hadoop.apache.org to unsubscribe. On Wed, Nov 26, 2014 at 3:08 PM, Li Chen ahli1...@gmail.com wrote: Please unsubscribe me, too. Li On Wed, Nov 26, 2014 at 3:03 PM, Sufi Nawaz s...@eaiti.com wrote: Please suggest how to unsubscribe from this list. Thank

Re: [VOTE] ACCUMULO-3176

2014-11-25 Thread Corey Nolet
I could understand the veto if the change actually caused one of the issues mentioned above or the issue that Sean is raising. But it does not. The eventual consistency of property updates was an issue before this change and continues to be an issue. This JIRA did not attempt to address the

Re: Configuring custom input format

2014-11-25 Thread Corey Nolet
assigning the object to a temporary variable. Matei On Nov 5, 2014, at 2:54 PM, Corey Nolet cjno...@gmail.com wrote: The closer I look @ the stack trace in the Scala shell, it appears to be the call to toString() that is causing the construction of the Job object to fail. Is there a ways

[jira] [Commented] (ACCUMULO-1817) Create a monitoring bridge similar to Hadoop's GangliaContext that can allow easy pluggable support

2014-11-25 Thread Corey Nolet (JIRA)
[ https://issues.apache.org/jira/browse/ACCUMULO-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224975#comment-14224975 ] Corey Nolet commented on ACCUMULO-1817: --- Awesome! Given that people have been

Job object toString() is throwing an exception

2014-11-25 Thread Corey Nolet
I was playing around in the Spark shell and newing up an instance of Job that I could use to configure the inputformat for a job. By default, the Scala shell println's the result of every command typed. It throws an exception when it printlns the newly created instance of Job because it looks like

Re: Job object toString() is throwing an exception

2014-11-25 Thread Corey Nolet
) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) On Tue, Nov 25, 2014 at 9:39 PM, Rohith Sharma K S rohithsharm...@huawei.com wrote: Could you give error message or stack trace? *From:* Corey Nolet [mailto:cjno...@gmail.com] *Sent

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Corey Nolet
I was actually about to post this myself- I have a complex join that could benefit from something like a GroupComparator vs having to do multiple grouyBy operations. This is probably the wrong thread for a full discussion on this but I didn't see a JIRA ticket for this or anything similar- any

Re: unsubscribe

2014-11-18 Thread Corey Nolet
Abdul, Please send an email to user-unsubscr...@spark.apache.org On Tue, Nov 18, 2014 at 2:05 PM, Abdul Hakeem alhak...@gmail.com wrote: - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: Contribute Examples/Exercises

2014-11-14 Thread Corey Nolet
, Corey Nolet cjno...@gmail.com wrote: Josh, My worry with a contrib module is that, historically, code which goes moves to a contrib is just one step away from the grave. You do have a good point. My hope was that this could be the beginning of our changing history so that we

Spark Hadoop 2.5.1

2014-11-14 Thread Corey Nolet
I noticed Spark 1.2.0-SNAPSHOT still has 2.4.x in the pom. Since 2.5.x is the current stable Hadoop 2.x, would it make sense for us to update the poms?

Re: Spark Hadoop 2.5.1

2014-11-14 Thread Corey Nolet
specialization needed beyond that. The profile sets hadoop.version to 2.4.0 by default, but this can be overridden. On Fri, Nov 14, 2014 at 3:43 PM, Corey Nolet cjno...@gmail.com wrote: I noticed Spark 1.2.0-SNAPSHOT still has 2.4.x in the pom. Since 2.5.x is the current stable Hadoop 2.x

Re: Contribute Examples/Exercises

2014-11-12 Thread Corey Nolet
+1 for adding the examples to contrib. I was, myself, reading over this email wondering how a set of 11 separate examples on the use of Accumulo would fit into the core codebase- especially as more are contributed over tinme. I like the idea of giving community members an outlet for contributing

Re: Contribute Examples/Exercises

2014-11-12 Thread Corey Nolet
the community which has been stagnant with respect to new committers for about 9 months now. Corey Nolet wrote: +1 for adding the examples to contrib. I was, myself, reading over this email wondering how a set of 11 separate examples on the use of Accumulo would fit into the core codebase

Spark SQL Lazy Schema Evaluation

2014-11-12 Thread Corey Nolet
I'm loading sequence files containing json blobs in the value, transforming them into RDD[String] and then using hiveContext.jsonRDD(). It looks like Spark reads the files twice- once when I I define the jsonRDD() and then again when I actually make my call to hiveContext.sql(). Looking @ the

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
+1 (non-binding) [for original process proposal] Greg, the first time I've seen the word ownership on this thread is in your message. The first time the word lead has appeared in this thread is in your message as well. I don't think that was the intent. The PMC and Committers have a

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
PMC [1] is responsible for oversight and does not designate partial or full committer. There are projects where all committers become PMC and others where PMC is reserved for committers with the most merit (and willingness to take on the responsibility of project oversight, releases, etc...).

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
I'm actually going to change my non-binding to +0 for the proposal as-is. I overlooked some parts of the original proposal that, when reading over them again, do not sit well with me. one of the maintainers needs to sign off on each patch to the component, as Greg has pointed out, does seem to

Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-11-06 Thread Corey Nolet
place that there is a problem is 'ln.streetnumber, which prevents the rest of the query from resolving. If you look at the subquery ln, it is only producing two columns: locationName and locationNumber. So streetnumber is not valid. On Tue, Oct 28, 2014 at 8:02 PM, Corey Nolet cjno...@gmail.com

Configuring custom input format

2014-11-05 Thread Corey Nolet
I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting up the configuration file via the static methods on input formats that require a Hadoop Job object is proving to be difficult. Trying to new up my own Job object with the

Re: Configuring custom input format

2014-11-05 Thread Corey Nolet
, Corey Nolet cjno...@gmail.com wrote: I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting up the configuration file via the static methods on input formats that require a Hadoop Job object is proving to be difficult. Trying

Re: Spark SQL takes unexpected time

2014-11-04 Thread Corey Nolet
Michael, I should probably look closer myself @ the design of 1.2 vs 1.1 but I've been curious why Spark's in-memory data uses the heap instead of putting it off heap? Was this the optimization that was done in 1.2 to alleviate GC? On Mon, Nov 3, 2014 at 8:52 PM, Shailesh Birari

Why mapred for the HadoopRDD?

2014-11-04 Thread Corey Nolet
I'm fairly new to spark and I'm trying to kick the tires with a few InputFormats. I noticed the sc.hadoopRDD() method takes a mapred JobConf instead of a MapReduce Job object. Is there future planned support for the mapreduce packaging?

Re: unsubscribe

2014-10-31 Thread Corey Nolet
Hongbin, Please send an email to user-unsubscr...@spark.apache.org in order to unsubscribe from the user list. On Fri, Oct 31, 2014 at 9:05 AM, Hongbin Liu hongbin@theice.com wrote: Apology for having to send to all. I am highly interested in spark, would like to stay in this mailing

Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Corey Nolet
at 2:19 PM, Corey Nolet cjno...@gmail.com wrote: Is it possible to select if, say, there was an addresses field that had a json array? You can get the Nth item by address.getItem(0). If you want to walk through the whole array look at LATERAL VIEW EXPLODE in HiveQL

Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Corey Nolet
).collect() res0: Array[org.apache.spark.sql.Row] = Array([John]) This will double show people who have more than one matching address. On Tue, Oct 28, 2014 at 5:52 PM, Corey Nolet cjno...@gmail.com wrote: So it wouldn't be possible to have a json string like this: { name:John, age:53, locations

Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Corey Nolet
Am I able to do a join on an exploded field? Like if I have another object: { streetNumber:2300, locationName:The Big Building} and I want to join with the previous json by the locations[].number field- is that possible? On Tue, Oct 28, 2014 at 9:31 PM, Corey Nolet cjno...@gmail.com wrote

Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Corey Nolet
$QueryExecution.sparkPlan(SQLContext.scala:400) On Tue, Oct 28, 2014 at 10:48 PM, Michael Armbrust mich...@databricks.com wrote: Can you println the .queryExecution of the SchemaRDD? On Tue, Oct 28, 2014 at 7:43 PM, Corey Nolet cjno...@gmail.com wrote: So this appears to work just fine: hctx.sql(SELECT

Re: Accumulo version at runtime?

2014-10-24 Thread Corey Nolet
Dylan, I know your original post mentioned grabbing it through the client API but there's not currently a way to do that. As Sean mentioned, you can do it if you have access to the cluster. You can run the reflection Keith provided by adding the files in $ACCUMULO_HOME/lib/ to your classpath and

Re: Raise Java dependency from 6 to 7

2014-10-19 Thread Corey Nolet
A concrete plan and a definite version upon which the upgrade would be applied sounds like it would benefit the community. If you plan far enough out (as Hadoop has done) and give the community enough of a notice, I can't see it being a problem as they would have ample time upgrade. On Sat, Oct

Re: How can I use a time window with the trident api?

2014-10-10 Thread Corey Nolet
I started a project to do sliding and tumbling windows in Storm. It could be used directly or as an example. http://github.com/calrissian/flowmix On Oct 9, 2014 11:54 PM, 姚驰 yaoch...@163.com wrote: Hello, I'm trying to use storm to manipulate our monitoring data, but I don't know how to add a

[ANNOUNCE] Fluo 1.0.0-alpha-1 Released

2014-10-09 Thread Corey Nolet
The Fluo project is happy to announce the 1.0.0-alpha-1 release of Fluo. Fluo is a transaction layer that enables incremental processing on top of Accumulo. It integrates into Yarn using Apache Twill. This is the first release of Fluo and is not ready for production use. We invite developers to

Re: C++ accumulo client -- native clients for Python, Go, Ruby etc

2014-10-06 Thread Corey Nolet
I'm all for this- though I'm curious to know the thoughts about maintenance and the design. Are we going to use thrift to tie the C++ client calls into the server-side components? Is that going to be maintained through a separate effort or is the plan to have the Accumulo community officially

[ANNOUNCE] Apache 1.6.1 Released

2014-10-03 Thread Corey Nolet
The Apache Accumulo project is happy to announce its 1.6.1 release. Version 1.6.1 is the most recent bug-fix release in its 1.6.x release line. This version includes numerous bug fixes and performance improvements over previous versions. Existing users of 1.6.x are encouraged to upgrade to this

[ANNOUNCE] Apache 1.6.1 Released

2014-10-03 Thread Corey Nolet
The Apache Accumulo project is happy to announce its 1.6.1 release. Version 1.6.1 is the most recent bug-fix release in its 1.6.x release line. This version includes numerous bug fixes and performance improvements over previous versions. Existing users of 1.6.x are encouraged to upgrade to this

Re: Accumulo Powered By Logo

2014-10-02 Thread Corey Nolet
I think a logo that's more friendly to place in a circle would be useful. The Accumulo logo is very squared off. On Thu, Oct 2, 2014 at 3:39 PM, Mike Drob mad...@cloudera.com wrote: Yea, as an outside observer, I would have no idea what Apache A is, nor any idea how to get more information.

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Corey Nolet
wonder if it's a JVM thing?) On Wed, Sep 24, 2014 at 9:06 PM, Corey Nolet cjno...@gmail.com wrote: Vote passes with 4 +1's and no -1's. Bill, were you able to get the IT to run yet? I'm still having timeouts on my end as well. On Wed, Sep 24, 2014 at 1:41 PM, Josh Elser josh.el

Re: [accumulo] your /dist/ artifacts - 1 BAD signature

2014-09-25 Thread Corey Nolet
technically anybody could do this, and merge it (along with the version bump to 1.6.2-SNAPSHOT commit) to 1.6.2-SNAPSHOT branch (and forward, with -sours), if Corey doesn't have time/gets busy. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Thu, Sep 25, 2014 at 2:21 PM, Corey

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Corey Nolet
directories for the test and the failsafe output. It doesn't fail for me. It's possible that there is some edge case that you and Bill are hitting that I'm not. Corey Nolet wrote: I'm seeing the behavior under Max OS X and Fedora 19 and they have been consistently

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Corey Nolet
Bill, I've been having that same IT issue and said the same thing It's not happening to others. I lifted the timeout completely and it never finished. On Wed, Sep 24, 2014 at 1:13 PM, Mike Drob mad...@cloudera.com wrote: Any chance the IRC chats can make it only the ML for posterity? Mike

Re: [DISCUSS] Thinking about branch names

2014-09-23 Thread Corey Nolet
+1 Using separate branches in this manner just adds complexity. I was wondering myself why we needed to create separate branches when all we're doing is tagging/deleting the already released ones. The only difference between where one leaves off and another begins is the name of the branch. On

Re: Apache Storm Graduation to a TLP

2014-09-22 Thread Corey Nolet
Congrats! On Mon, Sep 22, 2014 at 5:16 PM, P. Taylor Goetz ptgo...@gmail.com wrote: I’m pleased to announce that Apache Storm has graduated to a Top-Level Project (TLP), and I’d like to thank everyone in the Storm community for your contributions and help in achieving this important

Re: Apache Storm Graduation to a TLP

2014-09-22 Thread Corey Nolet
Congrats! On Mon, Sep 22, 2014 at 5:16 PM, P. Taylor Goetz ptgo...@gmail.com wrote: I’m pleased to announce that Apache Storm has graduated to a Top-Level Project (TLP), and I’d like to thank everyone in the Storm community for your contributions and help in achieving this important

Re: [VOTE] Apache Accumulo 1.5.2 RC1

2014-09-18 Thread Corey Nolet
If we are concerned with confusion about adoption of new versions, we should make a point to articulate the purpose very clearly in each of the announcements. I was in the combined camp an hour ago and now I'm also thinking we should keep them separate. On Fri, Sep 19, 2014 at 1:16 AM, Josh

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-09-17 Thread Corey Nolet
, 2014 at 6:50 PM, Corey Nolet-2 [via Apache Accumulo] [hidden email] http://user/SendEmail.jtp?type=nodenode=11303i=0 wrote: Awesome John! It's good to have this documented for future users. Keep us updated! On Sun, Aug 24, 2014 at 11:05 AM, JavaHokie [hidden email] http://user/SendEmail.jtp

Re: Decouple topology configuration from code

2014-09-16 Thread Corey Nolet
Awhile ago I had written a camel adapter for storm so that spout inputs could come from camel. Not sure how useful it would be for you but its located here: https://github.com/calrissian/storm-recipes/blob/master/camel/src/main/java/org/calrissian/recipes/camel/spout/CamelConsumerSpout.java Hi

Re: Decouple topology configuration from code

2014-09-16 Thread Corey Nolet
Also, Trident is a DSL for rapidly producing useful analytics in Storm and I've been working on a DSL that makes streams processing for complex event processing possible. That one is located here: https://github.com/calrissian/flowmix On Sep 16, 2014 4:29 AM, dominique.vill...@orange.com wrote:

Re: Time to release 1.6.1?

2014-09-11 Thread Corey Nolet
:) On 9/10/14, 10:43 AM, Corey Nolet wrote: I had posted this to the mailing list originally after a discussion with Christopher at the Accumulo Summit hack-a-thon and because I wanted to get into the release process to help out. Josh, I still wouldn't mind getting together 1.6.1

Re: Tablet server thrift issue

2014-09-01 Thread Corey Nolet
in further. On Fri, Aug 22, 2014 at 11:41 PM, Corey Nolet cjno...@gmail.com wrote: Josh, Your advice is definitely useful- I also thought about catching the exception and retrying with a fresh batch writer but the fact that the batch writer failure doesn't go away without being re-instantiated

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-24 Thread Corey Nolet
I'm thinking this could be a yarn.application.classpath configuration problem in your yarn-site.xml. I meant to ask earlier- how are you building your jar that gets deployed? Are you shading it? Using libjars? On Sun, Aug 24, 2014 at 6:56 AM, JavaHokie soozandjohny...@gmail.com wrote: Hey

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-24 Thread Corey Nolet
Awesome John! It's good to have this documented for future users. Keep us updated! On Sun, Aug 24, 2014 at 11:05 AM, JavaHokie soozandjohny...@gmail.com wrote: Hi Corey, Just to wrap things up, AccumuloMultipeTableInputFormat is working really well. This is an outstanding feature I can

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-23 Thread Corey Nolet
Awesome! I was going to recommend checking out the code last night so that you could put some logging statements in there. You've probably noticed this already but the MapWritable does not have static type parameters so it dumps out the fully qualified class name so that it can instantiate it back

Re: Tablet server thrift issue

2014-08-22 Thread Corey Nolet
). https://issues.apache.org/jira/browse/ACCUMULO-2990 On 8/22/14, 4:35 PM, Corey Nolet wrote: Eric Keith, Chris mentioned to me that you guys have seen this issue before. Any ideas from anyone else are much appreciated as well. I recently updated a project's dependencies to Accumulo 1.6.0

Re: Tablet server thrift issue

2014-08-22 Thread Corey Nolet
is that all mutations added before the last flush() happened are durable on the server. Anything else is a guess. I don't know the specifics, but that should be enough to work with (and saving off mutations shouldn't be too costly since they're stored serialized). On 8/22/14, 5:44 PM, Corey Nolet

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet
Hey John, Could you give an example of one of the ranges you are using which causes this to happen? On Fri, Aug 22, 2014 at 11:02 PM, John Yost soozandjohny...@gmail.com wrote: Hey Everyone, The AccumuloMultiTableInputFormat is an awesome addition to the Accumulo API and I am really

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet
The table configs get serialized as base64 and placed in the job's Configuration under the key AccumuloInputFormat.ScanOpts.TableConfigs. Could you verify/print what's being placed in this key in your configuration? On Sat, Aug 23, 2014 at 12:15 AM, JavaHokie soozandjohny...@gmail.com wrote:

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet
The tests I'm running aren't using the native Hadoop libs either. If you don't mind, a little more code as to how you are setting up your job would be useful. That's weird the key in the config would be null. Are you using the job.getConfiguration()? On Sat, Aug 23, 2014 at 12:31 AM, JavaHokie

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet
at 1:11 AM, Corey Nolet cjno...@gmail.com wrote: Job.getInstance(configuration) copies the configuration and makes its own. Try doing your debug statement from earlier on job.getConfiguration() and let's see what the base64 string looks like. On Sat, Aug 23, 2014 at 1:00 AM, JavaHokie

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet
That code I posted should be able to validate where you are getting hung up. Can you try running that on the machine and seeing if it prints the expected tables/ranges? Also, are you running the job live? What does the configuration look like for the job on your resource manager? Can you see if

Re: Kafka + Storm

2014-08-14 Thread Corey Nolet
Kafka is also distributed in nature, which is not something easily achieved by queuing brokers like ActiveMQ or JMS (1.0) in general. Kafka allows data to be partitioned across many machines which can grow as necessary as your data grows. On Thu, Aug 14, 2014 at 11:20 PM, Justin Workman

Re: Good way to test when topology in local cluster is fully active

2014-08-05 Thread Corey Nolet
it handles IPv4/6. Try adding the following JVM parameter when running your tests: -Djava.net.preferIPv4Stack=true -Taylor On Aug 4, 2014, at 8:49 PM, Corey Nolet cjno...@gmail.com wrote: I'm testing some sliding window algorithms with tuples emitted from a mock spout based on a timer

Re: Good way to test when topology in local cluster is fully active

2014-08-05 Thread Corey Nolet
Sorry- the ipv4 fix worked. On Tue, Aug 5, 2014 at 9:13 PM, Corey Nolet cjno...@gmail.com wrote: This did work. Thanks! On Tue, Aug 5, 2014 at 2:23 PM, P. Taylor Goetz ptgo...@gmail.com wrote: My guess is that the slowdown you are seeing is a result of the new version of ZooKeeper

Re: Good way to test when topology in local cluster is fully active

2014-08-05 Thread Corey Nolet
: -Djava.net.preferIPv4Stack=true -Taylor On Aug 4, 2014, at 8:49 PM, Corey Nolet cjno...@gmail.com wrote: I'm testing some sliding window algorithms with tuples emitted from a mock spout based on a timer but the amount of time it takes the topology to fully start up and activate seems to vary from

Re: Good way to test when topology in local cluster is fully active

2014-08-04 Thread Corey Nolet
); completeTopologyParam.setStormConf(daemonConf); completeTopologyParam.setTopologyName(getTopologyName()); Map result = Testing.completeTopology(cluster, topology, completeTopologyParam); }); -Vincent On Mon, Aug 4, 2014 at 8:49 PM, Corey Nolet cjno...@gmail.com wrote: I'm testing some sliding window algorithms

<    1   2   3   4   5   >