Want to test spark-sql-kafka but get unresolved dependency error

2016-10-13 Thread JayKay
I want to work with the Kafka integration for structured streaming. I use
Spark version 2.0.0. and I start the spark-shell with: 

spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.0.0

As described here:
https://github.com/apache/spark/blob/master/docs/structured-streaming-kafka-integration.md

But I get a unresolved dependency error ("unresolved dependency:
org.apache.spark#spark-sql-kafka-0-10_2.11;2.0.0: not found"). So it seems
not to be available via maven or spark-packages.

How can I accesss this package? Or am I doing something wrong/missing? 

Thank you for you help.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Want-to-test-spark-sql-kafka-but-get-unresolved-dependency-error-tp27891.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Sharing object/state accross transformations

2015-12-10 Thread JayKay
I solved the problem by passing the HLL object to the function, updating it
and returning it as new state. This was obviously a thinking barrier... ;-)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Sharing-object-state-accross-transformations-tp25544p25665.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Sharing object/state accross transformations

2015-12-02 Thread JayKay
I'm new to Apache Spark and an absolute beginner. I'm playing around with
Spark Streaming (API version 1.5.1) in Java and want to implement a
prototype which uses HyperLogLog to estimate distinct elements. I use the
stream-lib from clearspring (https://github.com/addthis/stream-lib). 

I planned to use updateStateByKey to hold a global state over all events.
The problem is that for every call of the specified function, my HLL returns
a 1 (it seems to use a new instance of my HLL object every time). Same
problem occurs with a simple, global integer variable which I tried to
increment in every function call. This also has always the initial value in
it. 

This is a code snippet where I define the update function:

Function2, Optional, Optional> hllCountFunction
= new Function2, Optional, Optional>() {
public Optional call(List values, Optional state)
throws Exception {
values.stream().forEach(value -> hll.offer(value));
long newState = state.isPresent() ? hll.cardinality() : 0;
return Optional.of(newState);
}
};


And this is the snippet how I use the function:

JavaPairDStream hllCounts = fullvisitorids.mapToPair(new
PairFunction() {
public Tuple2 call(String value) {
return new Tuple2("key", value);
}
}).updateStateByKey(hllCountFunction);

After a lot of research I found the concept of Accumulators. Do I need to
specify a custom Accumulator by extending the Accumulator class (in Java)? I
also read that for transformations this only should be used for debugging
purposes... 

So how can I achive to use one global defined HLL-object in a spark stream
transformation? I also tried to implement a custom Accumulator but this also
failed because I don't get how to use the AccumulableParam interface. I
implemented the Accumulator and overwrote the add and value methods. But
what do I have to do in the AccumulableParam with addAccumulator, addInPlace
and zero?

Thanks in advance for your help and your advice!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Sharing-object-state-accross-transformations-tp25544.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org