Ordering would be on a per-partition basis, not global ordering. You typically want to acquire resources inside the foreachpartition closure, just before handling the iterator.
http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd On Mon, Nov 16, 2015 at 4:02 PM, Nipun Arora <nipunarora2...@gmail.com> wrote: > Hi, > I wanted to understand forEachPartition logic. In the code below, I am > assuming the iterator is executing in a distributed fashion. > > 1. Assuming I have a stream which has timestamp data which is sorted. Will > the stringiterator in foreachPartition process each line in order? > > 2. Assuming I have a static pool of Kafka connections, where should I get > a connection from a pool to be used to send data to Kafka? > > addMTSUnmatched.foreachRDD( > new Function<JavaRDD<String>, Void>() { > @Override > public Void call(JavaRDD<String> stringJavaRDD) throws Exception { > stringJavaRDD.foreachPartition( > > new VoidFunction<Iterator<String>>() { > @Override > public void call(Iterator<String> stringIterator) > throws Exception { > while(stringIterator.hasNext()){ > String str = stringIterator.next(); > if(OnlineUtils.ESFlag) { > OnlineUtils.printToFile(str, 1, > type1_outputFile, OnlineUtils.client); > }else{ > OnlineUtils.printToFile(str, 1, > type1_outputFile); > } > } > } > } > ); > return null; > } > } > ); > > > > Thanks > > Nipun > >