Re: sampling function
Hi Do, DataSet provides a stable @Public interface. DataSetUtils is marked @PublicEvolving which is intended for public use, has stable behavior, but method signatures may change. It's also good to limit DataSet to common methods whereas the utility methods tend to be used for specific applications. I don't have the pulse of streaming but this sounds like a useful feature that could be added. Greg On Sat, Jul 9, 2016 at 10:47 AM, Le Quoc Dowrote: > Hi all, > > I'm working on approximate computing using sampling techniques. I > recognized that Flink supports the sample function for Dataset > (org/apache/flink/api/java/utils/DataSetUtils.java). I'm just wondering why > you didn't merge the function to org/apache/flink/api/java/DataSet.java > since the sample function works as a transformation operator? > > The second question is that are you planning to support the sample > function for DataStream (within windows) since I did not see it in > DataStream code ? > > Thank you, > Do >
sampling function
Hi all, I'm working on approximate computing using sampling techniques. I recognized that Flink supports the sample function for Dataset (org/apache/flink/api/java/utils/DataSetUtils.java). I'm just wondering why you didn't merge the function to org/apache/flink/api/java/DataSet.java since the sample function works as a transformation operator? The second question is that are you planning to support the sample function for DataStream (within windows) since I did not see it in DataStream code ? Thank you, Do
Re: Random access to small global state
U could use ignite too, I believe they have a plugin for flink streaming. Sent from my iPhone > On Jul 9, 2016, at 8:05 AM, Sebastianwrote: > > Hi, > > I'm planning to work on a streaming recommender in Flink, and one problem > that I have is that the algorithm needs random access to a small global state > (say a million counts). It should be ok if there is some inconsistency in the > state (e.g., delay in seeing updates). > > Does anyone here have experience with such things? I'm thinking of connecting > Flink to a lighweight in-memory key-value store such as memcache for that. > > Best, > Sebastian
Random access to small global state
Hi, I'm planning to work on a streaming recommender in Flink, and one problem that I have is that the algorithm needs random access to a small global state (say a million counts). It should be ok if there is some inconsistency in the state (e.g., delay in seeing updates). Does anyone here have experience with such things? I'm thinking of connecting Flink to a lighweight in-memory key-value store such as memcache for that. Best, Sebastian
Re: Extract type information from SortedMap
Hi Robert, On 9 July 2016 at 00:25, Robert Metzgerwrote: > Hi Yukun, > > can you also post the code how you are invoking the GenericFlatMapper on > the mailing list? > Here is the code defining the topology: DataStream stream = ...; stream .keyBy(new KeySelector () { @Override public Integer getKey(String x) throws Exception { return x.hashCode() % 10; } }) .timeWindow(Time.seconds(10)) .fold(new TreeMap (), new FoldFunction >() { @Override public SortedMap fold(SortedMap map, String x) { Long current = map.get(x); Long updated = current != null ? current + 1 : 1; map.put(x, updated); return map; } }) .flatMap(new GenericFlatMapper()) .returns(new TypeHint >(){}.getTypeInfo()) // throws InvalidTypesException if you comment out this line .print(); > > The Java compiler is usually dropping the generic types during compilation > ("type erasure"), that's why we can not infer the types. > > The error message implies type extraction should be possible when "all variables in the return type can be deduced from the input type(s)". This is true for flatMap(Tuple2 , Collector >), but if the signature is changed to void flatMap(SortedMap , Collector >), type inference fails. > > On Fri, Jul 8, 2016 at 12:27 PM, Yukun Guo wrote: > >> Hi, >> When I run the code implementing a generic FlatMapFunction, Flink >> complained about InvalidTypesException: >> >> public class GenericFlatMapper implements FlatMapFunction > Long>, Tuple2 > { >> @Override >> public void flatMap(SortedMap m, Collector > >> out) throws Exception { >> for (Map.Entry entry : m.entrySet()) { >> out.collect(Tuple2.of(entry.getKey(), entry.getValue())); >> } >> } >> } >> >> >> *Exception in thread "main" >> org.apache.flink.api.common.functions.InvalidTypesException: The return >> type of function could not be determined automatically, due to type >> erasure. You can give type information hints by using the returns(...) >> method on the result of the transformation call, or by letting your >> function implement the 'ResultTypeQueryable' interface.* >> >> *...* >> *Caused by: org.apache.flink.api.common.functions.InvalidTypesException: >> Type of TypeVariable 'T' in 'class GenericFlatMapper' could not be >> determined. This is most likely a type erasure problem. The type extraction >> currently supports types with generic variables only in cases where all >> variables in the return type can be deduced from the input type(s).* >> >> This puzzles me as Flink should be able to infer the type from arguments. >> I know returns(...) or other workarounds to give type hint, but they are >> kind of verbose. Any suggestions? >> >> >
Modifying start-cluster scripts to efficiently spawn multiple TMs
Hi, The current start/stop scripts SSH worker nodes each time they appear in the slaves file. When spawning multiple TMs (like 24 per node), this is very inefficient. I've changed the scripts to do one SSH per node and spawn a given N number of TMs afterwards. I can make a pull request if this seems usable to others. For now, I assume slaves file will indicate the number of TMs per slave in "IP N" format. Thank you, Saliya -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington