[GitHub] spark pull request: SPARK-1642: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/1386#issuecomment-62491569 OK I have a couple things to do this and next week. HW Spain. But after that lets get together and talk. On Mon, Nov 10, 2014 at 8:30 PM, Tathagata Das wrote: > @tmalaska <https://github.com/tmalaska> We dropped the ball on this > patch. We can work on this whenever you can get sometime to update the PR > with the master. > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/1386#issuecomment-62487654>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/1893#issuecomment-51936244 +1 I'm all for the update to 0.98 HBase. Just make sure we address everything that Sean O is asking. We need this to be able to build with Hadoop1 or Hadoop2 based on a profile. My code for Spark-2447 will need these changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark-2447 : Spark on HBase
Github user tmalaska closed the pull request at: https://github.com/apache/spark/pull/1608 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark-2447 : Spark on HBase
GitHub user tmalaska opened a pull request: https://github.com/apache/spark/pull/1608 Spark-2447 : Spark on HBase Add common solution for sending upsert actions to HBase (put, deletes, and increment) This is the first pull request: mainly to test the review process, but there are still a number of things that I plan to add this week. 1. Clean up the pom file 2. Add unit tests for the HConnectionStaticCache If I have time I will also add the following: 1. Support for Java 2. Additional unit tests for Java 3. Additional unit tests for Spark Streaming You can merge this pull request into a Git repository by running: $ git pull https://github.com/tmalaska/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1608.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1608 commit 6d9c733d4f177292cfc2fda15a6059660bd500f3 Author: tmalaska Date: 2014-07-27T03:17:06Z Spark-2447 : Spark on HBase Add common solution for sending upsert actions to HBase (put, deletes, and increment) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1642: Upgrade FlumeInputDStream's FlumeR...
GitHub user tmalaska opened a pull request: https://github.com/apache/spark/pull/1386 SPARK-1642: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-2083 This will allow encryption with SSL between Flume and Spark You can merge this pull request into a Git repository by running: $ git pull https://github.com/tmalaska/spark Spark-2447 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1386.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1386 commit 76e81d4ba3cf2c6e8d69de8bb7f6d94fa3aa2547 Author: tmalaska Date: 2014-07-12T10:10:40Z SPARK-1642: first draft --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...
Github user tmalaska closed the pull request at: https://github.com/apache/spark/pull/1168 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/1168#issuecomment-48537088 Done. I closed 566. Anything else. I'm open to work on anything. Just direct me to a jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska closed the pull request at: https://github.com/apache/spark/pull/566 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1478].3: Upgrade FlumeInputDStream's Fl...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/1347#issuecomment-48536862 Yes let me figure that out now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1478].3: Upgrade FlumeInputDStream's Fl...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/1347#issuecomment-48536448 Man I'm sorry this is taking so long. Thank you for your help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/1168#issuecomment-48400281 Let me know if I need to do anything --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/1168#issuecomment-48228048 What is the status of this Jira? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/1168#issuecomment-47044140 Let me know if there is anything I can do to help. On Jun 24, 2014 6:33 PM, "Tathagata Das" wrote: > This is a weird binary compatibility check failure, that should not be > thrown. We are looking at our end for fixing this and rerunning the tests. > Once this is figured out, I will merge this. > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/1168#issuecomment-47044062>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/1168#issuecomment-46767307 Thanks tdas I messed that one. I just updated. It should be good now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-46763419 New Pull request https://github.com/apache/spark/pull/1168 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478.2: Upgrade FlumeInputDStream's Flum...
GitHub user tmalaska opened a pull request: https://github.com/apache/spark/pull/1168 SPARK-1478.2: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 SPARK-1478.2: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 You can merge this pull request into a Git repository by running: $ git pull https://github.com/tmalaska/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1168.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1168 commit 12617e51c6f9fbbcf1b21db2cdcda2f7594b10d1 Author: tmalaska Date: 2014-06-21T20:03:58Z SPARK-1478: Upgrade FlumeInputDStream's Flume... SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-46755792 I'm going to have to make a new pull request, because I had drop the repo that belonged to this pull request. I will update the ticket with the information when it's ready --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-46726131 No worries. I'm starting to free up so I would love to do more work. I will finish this one up then the Flume encryption one. Then if you have anything else. Let me at it. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on a diff in the pull request: https://github.com/apache/spark/pull/566#discussion_r14040514 --- Diff: external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala --- @@ -134,22 +144,64 @@ private[streaming] class FlumeReceiver( host: String, port: Int, -storageLevel: StorageLevel +storageLevel: StorageLevel, +enableDecompression: Boolean ) extends Receiver[SparkFlumeEvent](storageLevel) with Logging { lazy val responder = new SpecificResponder( classOf[AvroSourceProtocol], new FlumeEventServer(this)) - lazy val server = new NettyServer(responder, new InetSocketAddress(host, port)) + var server: NettyServer = null + + private def initServer() = { +if (enableDecompression) { + val channelFactory = new NioServerSocketChannelFactory +(Executors.newCachedThreadPool(), Executors.newCachedThreadPool()); + val channelPipelieFactory = new CompressionChannelPipelineFactory() + + new NettyServer( +responder, +new InetSocketAddress(host, port), +channelFactory, +channelPipelieFactory, +null) +} else { + new NettyServer(responder, new InetSocketAddress(host, port)) +} + } def onStart() { -server.start() +synchronized { + if (server == null) { +server = initServer() +server.start() + } else { +logWarning("Flume receiver being asked to start more then once with out close") + } +} logInfo("Flume receiver started") } def onStop() { -server.close() +synchronized { + if (server != null) { +server.close() +server = null + } +} logInfo("Flume receiver stopped") } override def preferredLocation = Some(host) } + +private[streaming] +class CompressionChannelPipelineFactory extends ChannelPipelineFactory { + + def getPipeline() = { --- End diff -- Cool will do before the weekend is done. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-46724152 Let me know if there is anything I can do to help this go through. Thanks tdas On Fri, Jun 20, 2014 at 4:38 PM, Tathagata Das wrote: > Jenkins, test this again. > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/566#issuecomment-46724002>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/#issuecomment-46391465 Never mind. I had to close the pull request. I thought about it. The ccAccumulator is not accessible from the vprog which was my goal. I'm going to have to use a broadcast. I will have an update for tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...
Github user tmalaska closed the pull request at: https://github.com/apache/spark/pull/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/#issuecomment-46391395 Wait, this isn't going to get me what I want, because I can't read the ssAccumulator in the vprog. I think I will have to change to a boardcast. I will --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...
GitHub user tmalaska opened a pull request: https://github.com/apache/spark/pull/ Spark-2173 : Add Master Computer and SuperStep ... Add Master Computer and SuperStep Accumulator to Pregel GraphX Implemention You can merge this pull request into a Git repository by running: $ git pull https://github.com/tmalaska/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes # commit e174480e0c2650058007e8274b8931943c3798b0 Author: tmalaska Date: 2014-06-18T02:59:27Z Spark-2173 : Add Master Computer and SuperStep ... Add Master Computer and SuperStep Accumulator to Pregel GraphX Implemention --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-46372202 Hey tdas, I was going to do 1642 tonight, but I noticed these changes are not in the code yet. What should I do? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-42189283 LOL tdas, how it going. Just pinging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-41949374 Hey tdas, How is this Jira looking. Is there anything I need to do to get it passed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-41732028 I already updated the code and tested it. Feel free to commit unless you see anything wrong. If you commit it in the next couple hours. I can start on SPARK-1642 tonight or tomorrow morning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478
Github user tmalaska closed the pull request at: https://github.com/apache/spark/pull/405 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/405#issuecomment-41671506 As soon as I figure out how. I will look into it after work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-41663386 Let me know if the changes are ok. The only difference from what you told me to do was I made a check to prevent a double start. Let me know if you want me to take it out. If so I can make the change very fast. if (server == null) { server = initServer() server.start() } else { logWarning("Flume receiver being asked to start more then once with out close") } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-41631586 Will do. I will start tomorrow. Shouldn't take long. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on a diff in the pull request: https://github.com/apache/spark/pull/566#discussion_r12044730 --- Diff: external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala --- @@ -66,6 +84,23 @@ object FlumeUtils { port: Int, storageLevel: StorageLevel ): JavaReceiverInputDStream[SparkFlumeEvent] = { -createStream(jssc.ssc, hostname, port, storageLevel) +createStream(jssc.ssc, hostname, port, storageLevel, false) + } + + /** + * Creates a input stream from a Flume source. + * @param hostname Hostname of the slave machine to which the flume data will be sent + * @param port Port of the slave machine to which the flume data will be sent + * @param storageLevel Storage level to use for storing the received objects + * @param enableCompression Should Netty Server decode input stream from client --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on a diff in the pull request: https://github.com/apache/spark/pull/566#discussion_r12044643 --- Diff: external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala --- @@ -153,3 +181,15 @@ class FlumeReceiver( override def preferredLocation = Some(host) } + +private[streaming] +class CompressionChannelPipelineFactory() extends ChannelPipelineFactory { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on a diff in the pull request: https://github.com/apache/spark/pull/566#discussion_r12044638 --- Diff: external/flume/src/test/scala/org/apache/spark/streaming/flume/FlumeStreamSuite.scala --- @@ -85,4 +108,14 @@ class FlumeStreamSuite extends TestSuiteBase { assert(outputBuffer(i).head.event.getHeaders.get("test") === "header") } } + + class CompressionChannelFactory(compressionLevel: Int) extends --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/566#issuecomment-41546071 OK I have reviewed the commits and I will be making changes this morning. Thank tdas. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478: Upgrade FlumeInputDStream's FlumeR...
GitHub user tmalaska opened a pull request: https://github.com/apache/spark/pull/566 SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 You can merge this pull request into a Git repository by running: $ git pull https://github.com/tmalaska/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/566.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #566 commit 6a390690a19d4fe7d1c3c9029de66b94eb15be45 Author: tmalaska Date: 2014-04-26T13:17:02Z Finished Second draft --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41344456 Uploaded the exclude all and I ran a assembly and test-quick and it worked. Let me know what I should do next. Thanks again for the help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41335079 Good point @srowen we may be able to just exclude thrift all together. All we need is the avro source stuff. I will exclude from both and see if it works. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41331043 So what should we do? Flume 1.2.0 is even worse at Thrift 6.1. There are some people still on Hadoop 1 but most are on Hadoop 2 now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41323794 @srowen yes I agree. Yes I missed that one. The maven will do 7 and sbt will do 8. I will move the maven to 8 as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41319131 @pwendell to be honest this is a little deeper then I normally go with Pom specification. I think we are there is a behavior that maven does and there is a behavior that sbt does and they are not the same. My goal of this pull request was to make the same outcome for both maven and sbt. In the end thrift will be included because would have Flume pulled it in. It's not a perfect solution but the only option is to change the pom in Flume 1.4.0 but they have a different requirement of having two thrift options. I'm not even sure how Flume would honor that requirement without profiles. Also I figured it would be good to have Flume 1.4.0 in Spark 1.0, because Flume 1.4.0 is the most commonly used Flume out there and it has some really cool functionality I would like to add to the FlumeStream like compression and encryption. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41317846 Yes this is my first Spark commit. So I'm going to make some mistakes. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584: Upgrade Flume dependency to 1.4.0
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41316766 @berngp I undated pull request name as you recommended. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41316558 Soo the tags didn't make it through the last one. Here are the parts of the pom.xml again. {profile} {id}hadoop-2{/id} {activation} {property} {name}hadoop.profile{/name} {value}2{/value} {/property} {/activation} //- {profile} {id}hadoop-1.0{/id} {activation} {property} {name}!hadoop.profile{/name} {/property} {/activation} --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41316256 @pwendell I thought that too then I noticed the following two parts of the Flume 1.4.0 file // hadoop-1.0 !hadoop.profile // hadoop-2 hadoop.profile 2 // And from what I understand if the property hadoop.profile is equal to 2 then profile hadoop-2 is used otherwise hadoop-1.0 is used. So I'm still of the belief that this is a sbt bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584
Github user tmalaska commented on a diff in the pull request: https://github.com/apache/spark/pull/507#discussion_r11960961 --- Diff: project/SparkBuild.scala --- @@ -605,7 +606,8 @@ object SparkBuild extends Build { name := "spark-streaming-flume", previousArtifact := sparkPreviousArtifact("spark-streaming-flume"), libraryDependencies ++= Seq( - "org.apache.flume" % "flume-ng-sdk" % "1.2.0" % "compile" excludeAll(excludeNetty) + "org.apache.flume" % "flume-ng-sdk" % "1.4.0" % "compile" excludeAll(excludeNetty, excludeThrift), + "org.apache.thrift" % "libthrift" % "0.8.0" % "compile" --- End diff -- See comment below made at 1:41 EST April 14 2014 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41309801 @pwendell this is a great question. The answer there is ether a bug in sbt or I'm missing something in the SparkBuild.scala. In the Flume 1.4.0 pom.xml there is a dependency on thrift but the version is declared with a property and that property is defined in a profile. I'm not sure if the issue related to the property or the profile or the combination, but sbt does not use the value of the thirft version property and I get the following exception. sbt.ResolveException: unresolved dependency: org.apache.thrift#libthrift;${thrift.version}: not found Maven works just fine so I left that as is. So with my limited understand of sbt and why it was craping out. I decided to exclude the thrift dependency in Flume 1.4.0 and place it in the SparkBuild.scala file. I'm open to any and all help here. I don't know enough about sbt to know why it is having trouble with this. Side note, sbt works fine with Flume 1.3.0. This is because in Flume 1.3.0 the thrift version is hard coded in the Flume pom.xml. Flume 1.4.0 introduces the property value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41283871 OK this should be good now. Please review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/507#issuecomment-41164979 There is a build issue. Related to thrift.version. THis pull request should be consider involved. Researching now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1584
GitHub user tmalaska opened a pull request: https://github.com/apache/spark/pull/507 SPARK-1584 Updated the Flume dependency in the maven pom file and the scala build file. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tmalaska/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/507.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #507 commit 5bf56a7152c2ae75d2e554c4f47a6946f8bf2ab0 Author: tmalaska Date: 2014-04-23T13:40:43Z Upgrade flume version --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/405#issuecomment-40421425 Yeah no problem. Thanks for taking the time to review my code. This is my first time committing with Scala :) Just let me know when ( #300 ) is done and I will re check out. Also when you have time I would love to know how else I could help. I was thinking of adding : - encryption to the Flume Stream as is in Flume 1.4.0. - Fail recover support when a Flume Stream host goes down and Spark starts up the Flume Stream on another node. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1478
GitHub user tmalaska opened a pull request: https://github.com/apache/spark/pull/405 SPARK-1478 Initial Version You can merge this pull request into a Git repository by running: $ git pull https://github.com/tmalaska/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/405.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #405 commit c433827db5dfda6f5b1b6aa11e45447525b4aac4 Author: tmalaska Date: 2014-04-14T17:37:01Z SPARK-1478 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---