Henryk Konsek created CAMEL-9385:
------------------------------------

             Summary: Create Apache Spark component
                 Key: CAMEL-9385
                 URL: https://issues.apache.org/jira/browse/CAMEL-9385
             Project: Camel
          Issue Type: New Feature
            Reporter: Henryk Konsek
            Assignee: Henryk Konsek
             Fix For: 2.17.0


As a part of the the IoT project I'm working on, I have created a Spark 
component (1) to make it easier to handle analytics requests from devices. I 
would like to donate this code to the ASF Camel and extend it here, as I guess 
that there would be many people interested in using Spark from Camel.

The URI looks like {{spark:rdd/rddName/rddCallback}} or 
{{spark:dataframe/frameName/frameCallback}} depending if you would like to work 
with RDDs or DataFrames.

The idea here is that Camel route acts as a driver application. You specify 
RDD/DataFrames definitions (and callbacks to act against those) in a registry 
(for example as Spring beans or OSGi services). Then you send a parameters for 
the computations as a body of a message.

For example in Spring Boot you specify RDD+callback as:

{code}
@Bean
JavaRDD myRdd(SparkContext sparkContext) {
  return sparkContext.textFile("foo.txt");
}

@Bean
class MyAnalytics {

  @RddCallback
  long countLines(JavaRDD<String> textFile, long argument) {
     return rdd.count() * argument;
  }

}
{code}

Then you ask for the results of computations:

{code}
long results = producerTemplate.requestBody("spark:rdd/myRdd/MyAnalytics", 10, 
long.class);
{code}

Such setup is extremely useful for bridging Spark computations via different 
transports.

(1) https://github.com/rhiot/rhiot/tree/master/components/camel-spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to