[GitHub] spark pull request: SPARK-1642: Upgrade FlumeInputDStream's FlumeR...

2014-11-10 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/1386#issuecomment-62491569
  
OK I have a couple things to do this and next week.  HW Spain.  But after
that lets get together and talk.

On Mon, Nov 10, 2014 at 8:30 PM, Tathagata Das 
wrote:

> @tmalaska <https://github.com/tmalaska> We dropped the ball on this
> patch. We can work on this whenever you can get sometime to update the PR
> with the master.
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/1386#issuecomment-62487654>.
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-08-12 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/1893#issuecomment-51936244
  
+1
I'm all for the update to 0.98 HBase.

Just make sure we address everything that Sean O is asking.  We need this 
to be able to build with Hadoop1 or Hadoop2 based on a profile.

My code for Spark-2447 will need these changes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark-2447 : Spark on HBase

2014-08-01 Thread tmalaska
Github user tmalaska closed the pull request at:

https://github.com/apache/spark/pull/1608


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark-2447 : Spark on HBase

2014-07-26 Thread tmalaska
GitHub user tmalaska opened a pull request:

https://github.com/apache/spark/pull/1608

Spark-2447 : Spark on HBase

Add common solution for sending upsert actions to HBase (put, deletes,
and increment)

This is the first pull request: mainly to test the review process, but 
there are still a number of things that I plan to add this week.

1. Clean up the pom file
2. Add unit tests for the HConnectionStaticCache

If I have time I will also add the following:
1. Support for Java
2. Additional unit tests for Java
3. Additional unit tests for Spark Streaming

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tmalaska/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1608.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1608


commit 6d9c733d4f177292cfc2fda15a6059660bd500f3
Author: tmalaska 
Date:   2014-07-27T03:17:06Z

Spark-2447 : Spark on HBase

Add common solution for sending upsert actions to HBase (put, deletes,
and increment)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1642: Upgrade FlumeInputDStream's FlumeR...

2014-07-12 Thread tmalaska
GitHub user tmalaska opened a pull request:

https://github.com/apache/spark/pull/1386

SPARK-1642: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-2083

This will allow encryption with SSL between Flume and Spark

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tmalaska/spark Spark-2447

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1386


commit 76e81d4ba3cf2c6e8d69de8bb7f6d94fa3aa2547
Author: tmalaska 
Date:   2014-07-12T10:10:40Z

SPARK-1642: first draft




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...

2014-07-12 Thread tmalaska
Github user tmalaska closed the pull request at:

https://github.com/apache/spark/pull/1168


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...

2014-07-09 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/1168#issuecomment-48537088
  
Done.  I closed 566.

Anything else.  I'm open to work on anything.  Just direct me to a jira.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-07-09 Thread tmalaska
Github user tmalaska closed the pull request at:

https://github.com/apache/spark/pull/566


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1478].3: Upgrade FlumeInputDStream's Fl...

2014-07-09 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/1347#issuecomment-48536862
  
Yes let me figure that out now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1478].3: Upgrade FlumeInputDStream's Fl...

2014-07-09 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/1347#issuecomment-48536448
  
Man I'm sorry this is taking so long.  Thank you for your help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...

2014-07-08 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/1168#issuecomment-48400281
  
Let me know if I need to do anything


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...

2014-07-07 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/1168#issuecomment-48228048
  
What is the status of this Jira?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...

2014-06-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/1168#issuecomment-47044140
  
Let me know if there is anything I can do to help.
On Jun 24, 2014 6:33 PM, "Tathagata Das"  wrote:

> This is a weird binary compatibility check failure, that should not be
> thrown. We are looking at our end for fixing this and rerunning the tests.
> Once this is figured out, I will merge this.
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/1168#issuecomment-47044062>.
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...

2014-06-21 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/1168#issuecomment-46767307
  
Thanks tdas I messed that one.  I just updated.  It should be good now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-06-21 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-46763419
  
New Pull request https://github.com/apache/spark/pull/1168


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...

2014-06-21 Thread tmalaska
GitHub user tmalaska opened a pull request:

https://github.com/apache/spark/pull/1168

SPARK-1478.2: Upgrade FlumeInputDStream's FlumeReceiver to support 
FLUME-1915

SPARK-1478.2: Upgrade FlumeInputDStream's FlumeReceiver to support
FLUME-1915

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tmalaska/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1168


commit 12617e51c6f9fbbcf1b21db2cdcda2f7594b10d1
Author: tmalaska 
Date:   2014-06-21T20:03:58Z

SPARK-1478: Upgrade FlumeInputDStream's Flume...

SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support
FLUME-1915




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-06-21 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-46755792
  
I'm going to have to make a new pull request, because I had drop the repo 
that belonged to this pull request.  I will update the ticket with the 
information when it's ready


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-06-20 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-46726131
  
No worries.  I'm starting to free up so I would love to do more work.  I 
will finish this one up then the Flume encryption one.  Then if you have 
anything else. Let me at it.

Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-06-20 Thread tmalaska
Github user tmalaska commented on a diff in the pull request:

https://github.com/apache/spark/pull/566#discussion_r14040514
  
--- Diff: 
external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala
 ---
@@ -134,22 +144,64 @@ private[streaming]
 class FlumeReceiver(
 host: String,
 port: Int,
-storageLevel: StorageLevel
+storageLevel: StorageLevel,
+enableDecompression: Boolean
   ) extends Receiver[SparkFlumeEvent](storageLevel) with Logging {
 
   lazy val responder = new SpecificResponder(
 classOf[AvroSourceProtocol], new FlumeEventServer(this))
-  lazy val server = new NettyServer(responder, new InetSocketAddress(host, 
port))
+  var server: NettyServer = null
+
+  private def initServer() = {
+if (enableDecompression) {
+  val channelFactory = new NioServerSocketChannelFactory
+(Executors.newCachedThreadPool(), Executors.newCachedThreadPool());
+  val channelPipelieFactory = new CompressionChannelPipelineFactory()
+  
+  new NettyServer(
+responder, 
+new InetSocketAddress(host, port),
+channelFactory, 
+channelPipelieFactory, 
+null)
+} else {
+  new NettyServer(responder, new InetSocketAddress(host, port))
+}
+  }
 
   def onStart() {
-server.start()
+synchronized {
+  if (server == null) {
+server = initServer()
+server.start()
+  } else {
+logWarning("Flume receiver being asked to start more then once 
with out close")
+  }
+}
 logInfo("Flume receiver started")
   }
 
   def onStop() {
-server.close()
+synchronized {
+  if (server != null) {
+server.close()
+server = null
+  }
+}
 logInfo("Flume receiver stopped")
   }
 
   override def preferredLocation = Some(host)
 }
+
+private[streaming]
+class CompressionChannelPipelineFactory extends ChannelPipelineFactory {
+
+  def getPipeline() = {
--- End diff --

Cool will do before the weekend is done.  Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-06-20 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-46724152
  
Let me know if there is anything I can do to help this go through.

Thanks tdas


On Fri, Jun 20, 2014 at 4:38 PM, Tathagata Das 
wrote:

> Jenkins, test this again.
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/566#issuecomment-46724002>.
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...

2014-06-17 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/#issuecomment-46391465
  
Never mind. I had to close the pull request. I thought about it. The 
ccAccumulator is not accessible from the vprog which was my goal. I'm going to 
have to use a broadcast. I will have an update for tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...

2014-06-17 Thread tmalaska
Github user tmalaska closed the pull request at:

https://github.com/apache/spark/pull/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...

2014-06-17 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/#issuecomment-46391395
  
Wait, this isn't going to get me what I want, because I can't read the 
ssAccumulator in the vprog.  I think I will have to change to a boardcast.  I 
will 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...

2014-06-17 Thread tmalaska
GitHub user tmalaska opened a pull request:

https://github.com/apache/spark/pull/

Spark-2173 : Add Master Computer and SuperStep ...

Add Master Computer and SuperStep Accumulator to Pregel GraphX
Implemention

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tmalaska/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #


commit e174480e0c2650058007e8274b8931943c3798b0
Author: tmalaska 
Date:   2014-06-18T02:59:27Z

Spark-2173 : Add Master Computer and SuperStep ...

Add Master Computer and SuperStep Accumulator to Pregel GraphX
Implemention




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-06-17 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-46372202
  
Hey tdas,

I was going to do 1642 tonight, but I noticed these changes are not in the 
code yet.  What should I do?

Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-05-05 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-42189283
  
LOL tdas, how it going.  Just pinging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-05-01 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-41949374
  
Hey tdas,

How is this Jira looking.  Is there anything I need to do to get it passed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-04-29 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-41732028
  
I already updated the code and tested it.  Feel free to commit unless you 
see anything wrong.

If you commit it in the next couple hours.  I can start on SPARK-1642 
tonight or tomorrow morning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-29 Thread tmalaska
Github user tmalaska closed the pull request at:

https://github.com/apache/spark/pull/405


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-29 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/405#issuecomment-41671506
  
As soon as I figure out how.  I will look into it after work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-04-29 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-41663386
  
Let me know if the changes are ok.  The only difference from what you told 
me to do was I made a check to prevent a double start.  Let me know if you want 
me to take it out.  If so I can make the change very fast.

  if (server == null) {
server = initServer()
server.start()
  } else {
logWarning("Flume receiver being asked to start more then once with 
out close")
  }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-04-28 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-41631586
  
Will do.  I will start tomorrow.  Shouldn't take long.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-04-28 Thread tmalaska
Github user tmalaska commented on a diff in the pull request:

https://github.com/apache/spark/pull/566#discussion_r12044730
  
--- Diff: 
external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala 
---
@@ -66,6 +84,23 @@ object FlumeUtils {
   port: Int,
   storageLevel: StorageLevel
 ): JavaReceiverInputDStream[SparkFlumeEvent] = {
-createStream(jssc.ssc, hostname, port, storageLevel)
+createStream(jssc.ssc, hostname, port, storageLevel, false)
+  }
+
+  /**
+   * Creates a input stream from a Flume source.
+   * @param hostname Hostname of the slave machine to which the flume data 
will be sent
+   * @param port Port of the slave machine to which the flume data 
will be sent
+   * @param storageLevel  Storage level to use for storing the received 
objects
+   * @param enableCompression  Should Netty Server decode input stream 
from client  
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-04-28 Thread tmalaska
Github user tmalaska commented on a diff in the pull request:

https://github.com/apache/spark/pull/566#discussion_r12044643
  
--- Diff: 
external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala
 ---
@@ -153,3 +181,15 @@ class FlumeReceiver(
 
   override def preferredLocation = Some(host)
 }
+
+private[streaming]
+class CompressionChannelPipelineFactory() extends ChannelPipelineFactory {
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-04-28 Thread tmalaska
Github user tmalaska commented on a diff in the pull request:

https://github.com/apache/spark/pull/566#discussion_r12044638
  
--- Diff: 
external/flume/src/test/scala/org/apache/spark/streaming/flume/FlumeStreamSuite.scala
 ---
@@ -85,4 +108,14 @@ class FlumeStreamSuite extends TestSuiteBase {
   assert(outputBuffer(i).head.event.getHeaders.get("test") === 
"header")
 }
   }
+
+  class CompressionChannelFactory(compressionLevel: Int) extends
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-04-28 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/566#issuecomment-41546071
  
OK I have reviewed the commits and I will be making changes this morning.  
Thank tdas.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...

2014-04-26 Thread tmalaska
GitHub user tmalaska opened a pull request:

https://github.com/apache/spark/pull/566

SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tmalaska/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #566


commit 6a390690a19d4fe7d1c3c9029de66b94eb15be45
Author: tmalaska 
Date:   2014-04-26T13:17:02Z

Finished Second draft




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41344456
  
Uploaded the exclude all and I ran a assembly and test-quick and it worked.

Let me know what I should do next.

Thanks again for the help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41335079
  
Good point @srowen we may be able to just exclude thrift all together.  All 
we need is the avro source stuff.

I will exclude from both and see if it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41331043
  
So what should we do?  Flume 1.2.0 is even worse at Thrift 6.1.

There are some people still on Hadoop 1 but most are on Hadoop 2 now.






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41323794
  
@srowen yes I agree.  Yes I missed that one.  The maven will do 7 and sbt 
will do 8.

I will move the maven to 8 as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41319131
  
@pwendell to be honest this is a little deeper then I normally go with Pom 
specification.  

I think we are there is a behavior that maven does and there is a behavior 
that sbt does and they are not the same.

My goal of this pull request was to make the same outcome for both maven 
and sbt.  In the end thrift will be included because would have Flume pulled it 
in.  

It's not a perfect solution but the only option is to change the pom in 
Flume 1.4.0 but they have a different requirement of having two thrift options. 
 I'm not even sure how Flume would honor that requirement without profiles.  

Also I figured it would be good to have Flume 1.4.0 in Spark 1.0, because 
Flume 1.4.0 is the most commonly used Flume out there and it has some really 
cool functionality I would like to add to the FlumeStream like compression and 
encryption.

 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41317846
  
Yes this is my first Spark commit.  So I'm going to make some mistakes.  :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41316766
  
@berngp I undated pull request name as you recommended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41316558
  
Soo the tags didn't make it through the last one.  Here are the parts of 
the pom.xml again.

{profile}
  {id}hadoop-2{/id}
  {activation}
{property}
  {name}hadoop.profile{/name}
  {value}2{/value}
{/property}
  {/activation}

//-

{profile}
  {id}hadoop-1.0{/id}
  {activation}
{property}
  {name}!hadoop.profile{/name}
{/property}
  {/activation}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41316256
  
@pwendell I thought that too then I noticed the following two parts of the 
Flume 1.4.0 file

//
 
  hadoop-1.0
  

  !hadoop.profile

  
//

  hadoop-2
  

  hadoop.profile
  2

  
//

And from what I understand if the property hadoop.profile is equal to 2 
then profile hadoop-2 is used otherwise hadoop-1.0 is used.

So I'm still of the belief that this is a sbt bug.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584

2014-04-24 Thread tmalaska
Github user tmalaska commented on a diff in the pull request:

https://github.com/apache/spark/pull/507#discussion_r11960961
  
--- Diff: project/SparkBuild.scala ---
@@ -605,7 +606,8 @@ object SparkBuild extends Build {
 name := "spark-streaming-flume",
 previousArtifact := sparkPreviousArtifact("spark-streaming-flume"),
 libraryDependencies ++= Seq(
-  "org.apache.flume" % "flume-ng-sdk" % "1.2.0" % "compile" 
excludeAll(excludeNetty)
+  "org.apache.flume" % "flume-ng-sdk" % "1.4.0" % "compile" 
excludeAll(excludeNetty, excludeThrift),
+  "org.apache.thrift" % "libthrift" % "0.8.0" % "compile"
--- End diff --

See comment below made at 1:41 EST April 14 2014


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41309801
  
@pwendell  this is a great question.

The answer there is ether a bug in sbt or I'm missing something in the 
SparkBuild.scala.

In the Flume 1.4.0 pom.xml there is a dependency on thrift but the version 
is declared with a property and that property is defined in a profile.  I'm not 
sure if the issue related to the property or the profile or the combination, 
but sbt does not use the value of the thirft version property and I get the 
following exception. 

sbt.ResolveException: unresolved dependency: 
org.apache.thrift#libthrift;${thrift.version}: not found 

Maven works just fine so I left that as is.  

So with my limited understand of sbt and why it was craping out.  I decided 
to exclude the thrift dependency in Flume 1.4.0 and place it in the 
SparkBuild.scala file.  

I'm open to any and all help here.  I don't know enough about sbt to know 
why it is having trouble with this.

Side note, sbt works fine with Flume 1.3.0.  This is because in Flume 1.3.0 
the thrift version is hard coded in the Flume pom.xml.  Flume 1.4.0 introduces 
the property value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584

2014-04-24 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41283871
  
OK this should be good now.  Please review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584

2014-04-23 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/507#issuecomment-41164979
  
There is a build issue.  Related to thrift.version.  THis pull request 
should be consider involved.

Researching now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1584

2014-04-23 Thread tmalaska
GitHub user tmalaska opened a pull request:

https://github.com/apache/spark/pull/507

SPARK-1584

Updated the Flume dependency in the maven pom file and the scala build file.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tmalaska/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/507.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #507


commit 5bf56a7152c2ae75d2e554c4f47a6946f8bf2ab0
Author: tmalaska 
Date:   2014-04-23T13:40:43Z

Upgrade flume version




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/405#issuecomment-40421425
  
Yeah no problem.  Thanks for taking the time to review my code.  This is my 
first time committing with Scala :)

Just let me know when ( #300 ) is done and I will re check out.  Also when 
you have time I would love to know how else I could help.

I was thinking of adding :
- encryption to the Flume Stream as is in Flume 1.4.0.
- Fail recover support when a Flume Stream host goes down and Spark starts 
up the Flume Stream on another node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread tmalaska
GitHub user tmalaska opened a pull request:

https://github.com/apache/spark/pull/405

SPARK-1478

Initial Version

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tmalaska/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #405


commit c433827db5dfda6f5b1b6aa11e45447525b4aac4
Author: tmalaska 
Date:   2014-04-14T17:37:01Z

SPARK-1478




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---