Chris , In the current state, I just want Samza to connect to 127.0.0.1.
I have set YARN_HOME to $ echo $YARN_HOME /home/ctippur/hello-samza/deploy/yarn I still dont see anything on hadoop console. Also, I see this during startup SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. - Shekar On Tue, Sep 2, 2014 at 2:20 PM, Chris Riccomini < criccom...@linkedin.com.invalid> wrote: > Hey Shekar, > > Ah ha. In that case, do you expect your SamzaContainer to try to connect > to the RM at 127.0.0.1, or do you expect it to try to connect to some > remote RM? If you expect it to try and connect to a remote RM, it's not > doing that. Perhaps because YARN_HOME isn't set. > > If you go to your RM's web interface, how many active nodes do you see > listed? > > Cheers, > Chris > > On 9/2/14 2:17 PM, "Shekar Tippur" <ctip...@gmail.com> wrote: > > >Chris .. > > > >I am using a rhel server to host all the components (Yarn, kafka, samza). > >I > >dont have ACLs open to wikipedia. > >I am following .. > > > http://samza.incubator.apache.org/learn/tutorials/latest/run-hello-samza-w > >ithout-internet.html > > > >- Shekar > > > > > > > >On Tue, Sep 2, 2014 at 2:13 PM, Chris Riccomini < > >criccom...@linkedin.com.invalid> wrote: > > > >> Hey Shekar, > >> > >> Can you try changing that to: > >> > >> http://127.0.0.1:8088/cluster > >> > >> > >> And see if you can connect? > >> > >> Cheers, > >> Chris > >> > >> On 9/2/14 1:21 PM, "Shekar Tippur" <ctip...@gmail.com> wrote: > >> > >> >Other observation is .. > >> > > >> >http://10.132.62.185:8088/cluster shows that no applications are > >>running. > >> > > >> >- Shekar > >> > > >> > > >> > > >> > > >> >On Tue, Sep 2, 2014 at 1:15 PM, Shekar Tippur <ctip...@gmail.com> > >>wrote: > >> > > >> >> Yarn seem to be running .. > >> >> > >> >> yarn 5462 0.0 2.0 1641296 161404 ? Sl Jun20 95:26 > >> >> /usr/java/jdk1.6.0_31/bin/java -Dproc_resourcemanager -Xmx1000m > >> >> -Dhadoop.log.dir=/var/log/hadoop-yarn > >> >>-Dyarn.log.dir=/var/log/hadoop-yarn > >> >> -Dhadoop.log.file=yarn-yarn-resourcemanager-pppdc9prd2y2.log > >> >> -Dyarn.log.file=yarn-yarn-resourcemanager-pppdc9prd2y2.log > >> >> -Dyarn.home.dir=/usr/lib/hadoop-yarn > >> >>-Dhadoop.home.dir=/usr/lib/hadoop-yarn > >> >> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA > >> >> -Djava.library.path=/usr/lib/hadoop/lib/native -classpath > >> >> > >> > >>>>/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/ > >>>>*: > >> > >>>>/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/* > >>>>:/ > >> > >>>>usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yar > >>>>n/ > >> > >>>>.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/us > >>>>r/ > >> > >>>>lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*:/etc/hadoop/conf/rm-con > >>>>fi > >> >>g/log4j.properties > >> >> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager > >> >> > >> >> I can tail kafka topic as well .. > >> >> > >> >> deploy/kafka/bin/kafka-console-consumer.sh --zookeeper > >>localhost:2181 > >> >>--topic wikipedia-raw > >> >> > >> >> > >> >> > >> >> > >> >> On Tue, Sep 2, 2014 at 1:04 PM, Chris Riccomini < > >> >> criccom...@linkedin.com.invalid> wrote: > >> >> > >> >>> Hey Shekar, > >> >>> > >> >>> It looks like your job is hanging trying to connect to the RM on > >>your > >> >>> localhost. I thought that you said that your job was running in > >>local > >> >>> mode. If so, it should be using the LocalJobFactory. If not, and you > >> >>> intend to run on YARN, is your YARN RM up and running on localhost? > >> >>> > >> >>> Cheers, > >> >>> Chris > >> >>> > >> >>> On 9/2/14 12:22 PM, "Shekar Tippur" <ctip...@gmail.com> wrote: > >> >>> > >> >>> >Chris .. > >> >>> > > >> >>> >$ cat ./deploy/samza/undefined-samza-container-name.log > >> >>> > > >> >>> >2014-09-02 11:17:58 JobRunner [INFO] job factory: > >> >>> >org.apache.samza.job.yarn.YarnJobFactory > >> >>> > > >> >>> >2014-09-02 11:17:59 ClientHelper [INFO] trying to connect to RM > >> >>> >127.0.0.1:8032 > >> >>> > > >> >>> >2014-09-02 11:17:59 NativeCodeLoader [WARN] Unable to load > >> >>>native-hadoop > >> >>> >library for your platform... using builtin-java classes where > >> >>>applicable > >> >>> > > >> >>> >2014-09-02 11:17:59 RMProxy [INFO] Connecting to ResourceManager > >>at / > >> >>> >127.0.0.1:8032 > >> >>> > > >> >>> > > >> >>> >and Log4j .. > >> >>> > > >> >>> ><!DOCTYPE log4j:configuration SYSTEM "log4j.dtd"> > >> >>> > > >> >>> ><log4j:configuration > >>xmlns:log4j="http://jakarta.apache.org/log4j/"> > >> >>> > > >> >>> > <appender name="RollingAppender" > >> >>> >class="org.apache.log4j.DailyRollingFileAppender"> > >> >>> > > >> >>> > <param name="File" > >> >>> >value="${samza.log.dir}/${samza.container.name}.log" > >> >>> >/> > >> >>> > > >> >>> > <param name="DatePattern" value="'.'yyyy-MM-dd" /> > >> >>> > > >> >>> > <layout class="org.apache.log4j.PatternLayout"> > >> >>> > > >> >>> > <param name="ConversionPattern" value="%d{yyyy-MM-dd > >>HH:mm:ss} > >> >>> %c{1} > >> >>> >[%p] %m%n" /> > >> >>> > > >> >>> > </layout> > >> >>> > > >> >>> > </appender> > >> >>> > > >> >>> > <root> > >> >>> > > >> >>> > <priority value="info" /> > >> >>> > > >> >>> > <appender-ref ref="RollingAppender"/> > >> >>> > > >> >>> > </root> > >> >>> > > >> >>> ></log4j:configuration> > >> >>> > > >> >>> > > >> >>> >On Tue, Sep 2, 2014 at 12:02 PM, Chris Riccomini < > >> >>> >criccom...@linkedin.com.invalid> wrote: > >> >>> > > >> >>> >> Hey Shekar, > >> >>> >> > >> >>> >> Can you attach your log files? I'm wondering if it's a > >> >>>mis-configured > >> >>> >> log4j.xml (or missing slf4j-log4j jar), which is leading to > >>nearly > >> >>> empty > >> >>> >> log files. Also, I'm wondering if the job starts fully. Anything > >>you > >> >>> can > >> >>> >> attach would be helpful. > >> >>> >> > >> >>> >> Cheers, > >> >>> >> Chris > >> >>> >> > >> >>> >> On 9/2/14 11:43 AM, "Shekar Tippur" <ctip...@gmail.com> wrote: > >> >>> >> > >> >>> >> >I am running in local mode. > >> >>> >> > > >> >>> >> >S > >> >>> >> >On Sep 2, 2014 11:42 AM, "Yan Fang" <yanfang...@gmail.com> > >>wrote: > >> >>> >> > > >> >>> >> >> Hi Shekar. > >> >>> >> >> > >> >>> >> >> Are you running job in local mode or yarn mode? If yarn mode, > >>the > >> >>> log > >> >>> >> >>is in > >> >>> >> >> the yarn's container log. > >> >>> >> >> > >> >>> >> >> Thanks, > >> >>> >> >> > >> >>> >> >> Fang, Yan > >> >>> >> >> yanfang...@gmail.com > >> >>> >> >> +1 (206) 849-4108 > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> On Tue, Sep 2, 2014 at 11:36 AM, Shekar Tippur > >> >>><ctip...@gmail.com> > >> >>> >> >>wrote: > >> >>> >> >> > >> >>> >> >> > Chris, > >> >>> >> >> > > >> >>> >> >> > Got some time to play around a bit more. > >> >>> >> >> > I tried to edit > >> >>> >> >> > > >> >>> >> >> > > >> >>> >> >> > >> >>> >> > >> >>> > >> >>> > >> > >>>>>>>>>samza-wikipedia/src/main/java/samza/examples/wikipedia/task/Wikipe > >>>>>>>>>di > >> >>>>>>>aFe > >> >>> >>>>ed > >> >>> >> >>StreamTask.java > >> >>> >> >> > to add a logger info statement to tap the incoming message. > >>I > >> >>>dont > >> >>> >>see > >> >>> >> >> the > >> >>> >> >> > messages being printed to the log file. > >> >>> >> >> > > >> >>> >> >> > Is this the right place to start? > >> >>> >> >> > > >> >>> >> >> > public class WikipediaFeedStreamTask implements StreamTask { > >> >>> >> >> > > >> >>> >> >> > private static final SystemStream OUTPUT_STREAM = new > >> >>> >> >> > SystemStream("kafka", > >> >>> >> >> > "wikipedia-raw"); > >> >>> >> >> > > >> >>> >> >> > private static final Logger log = LoggerFactory.getLogger > >> >>> >> >> > (WikipediaFeedStreamTask.class); > >> >>> >> >> > > >> >>> >> >> > @Override > >> >>> >> >> > > >> >>> >> >> > public void process(IncomingMessageEnvelope envelope, > >> >>> >> >>MessageCollector > >> >>> >> >> > collector, TaskCoordinator coordinator) { > >> >>> >> >> > > >> >>> >> >> > Map<String, Object> outgoingMap = > >> >>> >> >> > WikipediaFeedEvent.toMap((WikipediaFeedEvent) > >> >>> >>envelope.getMessage()); > >> >>> >> >> > > >> >>> >> >> > log.info(envelope.getMessage().toString()); > >> >>> >> >> > > >> >>> >> >> > collector.send(new > >>OutgoingMessageEnvelope(OUTPUT_STREAM, > >> >>> >> >> > outgoingMap)); > >> >>> >> >> > > >> >>> >> >> > } > >> >>> >> >> > > >> >>> >> >> > } > >> >>> >> >> > > >> >>> >> >> > > >> >>> >> >> > On Mon, Aug 25, 2014 at 9:01 AM, Chris Riccomini < > >> >>> >> >> > criccom...@linkedin.com.invalid> wrote: > >> >>> >> >> > > >> >>> >> >> > > Hey Shekar, > >> >>> >> >> > > > >> >>> >> >> > > Your thought process is on the right track. It's probably > >> >>>best > >> >>> to > >> >>> >> >>start > >> >>> >> >> > > with hello-samza, and modify it to get what you want. To > >> >>>start > >> >>> >>with, > >> >>> >> >> > > you'll want to: > >> >>> >> >> > > > >> >>> >> >> > > 1. Write a simple StreamTask that just does something > >>silly > >> >>>like > >> >>> >> >>just > >> >>> >> >> > > print messages that it receives. > >> >>> >> >> > > 2. Write a configuration for the job that consumes from > >>just > >> >>>the > >> >>> >> >>stream > >> >>> >> >> > > (alerts from different sources). > >> >>> >> >> > > 3. Run this to make sure you've got it working. > >> >>> >> >> > > 4. Now add your table join. This can be either a > >>change-data > >> >>> >>capture > >> >>> >> >> > (CDC) > >> >>> >> >> > > stream, or via a remote DB call. > >> >>> >> >> > > > >> >>> >> >> > > That should get you to a point where you've got your job > >>up > >> >>>and > >> >>> >> >> running. > >> >>> >> >> > > From there, you could create your own Maven project, and > >> >>>migrate > >> >>> >> >>your > >> >>> >> >> > code > >> >>> >> >> > > over accordingly. > >> >>> >> >> > > > >> >>> >> >> > > Cheers, > >> >>> >> >> > > Chris > >> >>> >> >> > > > >> >>> >> >> > > On 8/24/14 1:42 AM, "Shekar Tippur" <ctip...@gmail.com> > >> >>>wrote: > >> >>> >> >> > > > >> >>> >> >> > > >Chris, > >> >>> >> >> > > > > >> >>> >> >> > > >I have gone thro the documentation and decided that the > >> >>>option > >> >>> >> >>that is > >> >>> >> >> > > >most > >> >>> >> >> > > >suitable for me is stream-table. > >> >>> >> >> > > > > >> >>> >> >> > > >I see the following things: > >> >>> >> >> > > > > >> >>> >> >> > > >1. Point samza to a table (database) > >> >>> >> >> > > >2. Point Samza to a stream - Alert stream from different > >> >>> sources > >> >>> >> >> > > >3. Join key like a hostname > >> >>> >> >> > > > > >> >>> >> >> > > >I have Hello Samza working. To extend that to do what my > >> >>>needs > >> >>> >> >>are, I > >> >>> >> >> am > >> >>> >> >> > > >not sure where to start (Needs more code change OR > >> >>> configuration > >> >>> >> >> changes > >> >>> >> >> > > >OR > >> >>> >> >> > > >both)? > >> >>> >> >> > > > > >> >>> >> >> > > >I have gone thro > >> >>> >> >> > > > > >> >>> >> >> > > >> >>> >> >> > >> >>> >> >> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > http://samza.incubator.apache.org/learn/documentation/latest/api/overvie > >> >>>w > >> >>> >> >> > > . > >> >>> >> >> > > >html > >> >>> >> >> > > > > >> >>> >> >> > > >Is my thought process on the right track? Can you please > >> >>>point > >> >>> >>me > >> >>> >> >>to > >> >>> >> >> the > >> >>> >> >> > > >right direction? > >> >>> >> >> > > > > >> >>> >> >> > > >- Shekar > >> >>> >> >> > > > > >> >>> >> >> > > > > >> >>> >> >> > > > > >> >>> >> >> > > >On Thu, Aug 21, 2014 at 1:08 PM, Shekar Tippur > >> >>> >><ctip...@gmail.com> > >> >>> >> >> > wrote: > >> >>> >> >> > > > > >> >>> >> >> > > >> Chris, > >> >>> >> >> > > >> > >> >>> >> >> > > >> This is perfectly good answer. I will start poking more > >> >>>into > >> >>> >> >>option > >> >>> >> >> > #4. > >> >>> >> >> > > >> > >> >>> >> >> > > >> - Shekar > >> >>> >> >> > > >> > >> >>> >> >> > > >> > >> >>> >> >> > > >> On Thu, Aug 21, 2014 at 1:05 PM, Chris Riccomini < > >> >>> >> >> > > >> criccom...@linkedin.com.invalid> wrote: > >> >>> >> >> > > >> > >> >>> >> >> > > >>> Hey Shekar, > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> Your two options are really (3) or (4), then. You can > >> >>>either > >> >>> >>run > >> >>> >> >> some > >> >>> >> >> > > >>> external DB that holds the data set, and you can > >>query it > >> >>> >>from a > >> >>> >> >> > > >>> StreamTask, or you can use Samza's state store > >>feature to > >> >>> >>push > >> >>> >> >>data > >> >>> >> >> > > >>>into a > >> >>> >> >> > > >>> stream that you can then store in a partitioned > >>key-value > >> >>> >>store > >> >>> >> >> along > >> >>> >> >> > > >>>with > >> >>> >> >> > > >>> your StreamTasks. There is some documentation here > >>about > >> >>>the > >> >>> >> >>state > >> >>> >> >> > > >>>store > >> >>> >> >> > > >>> approach: > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> > >> >>> >> >> > > > >> >>> >> >> > >> >>> >> > >> >>> > >> >>> > >> > http://samza.incubator.apache.org/learn/documentation/0.7.0/container/st > >> >>> >> >> > > >>>ate > >> >>> >> >> > > >>> -management.html > >> >>> >> >> > > >>> > >> >>> >> >> > > >>>< > >> >>> >> >> > > > >> >>> >> >> > >> >>> > >> >>>>> > >> http://samza.incubator.apache.org/learn/documentation/0.7.0/container/ > >> >>>>>s > >> >>> >> >> > > >>>tate-management.html> > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> (4) is going to require more up front effort from you, > >> >>>since > >> >>> >> >>you'll > >> >>> >> >> > > >>>have > >> >>> >> >> > > >>> to understand how Kafka's partitioning model works, > >>and > >> >>> setup > >> >>> >> >>some > >> >>> >> >> > > >>> pipeline to push the updates for your state. In the > >>long > >> >>> >>run, I > >> >>> >> >> > believe > >> >>> >> >> > > >>> it's the better approach, though. Local lookups on a > >> >>> >>key-value > >> >>> >> >> store > >> >>> >> >> > > >>> should be faster than doing remote RPC calls to a DB > >>for > >> >>> >>every > >> >>> >> >> > message. > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> I'm sorry I can't give you a more definitive answer. > >>It's > >> >>> >>really > >> >>> >> >> > about > >> >>> >> >> > > >>> trade-offs. > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> Cheers, > >> >>> >> >> > > >>> Chris > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> On 8/21/14 12:22 PM, "Shekar Tippur" > >><ctip...@gmail.com> > >> >>> >>wrote: > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> >Chris, > >> >>> >> >> > > >>> > > >> >>> >> >> > > >>> >A big thanks for a swift response. The data set is > >>huge > >> >>>and > >> >>> >>the > >> >>> >> >> > > >>>frequency > >> >>> >> >> > > >>> >is in burst. > >> >>> >> >> > > >>> >What do you suggest? > >> >>> >> >> > > >>> > > >> >>> >> >> > > >>> >- Shekar > >> >>> >> >> > > >>> > > >> >>> >> >> > > >>> > > >> >>> >> >> > > >>> >On Thu, Aug 21, 2014 at 11:05 AM, Chris Riccomini < > >> >>> >> >> > > >>> >criccom...@linkedin.com.invalid> wrote: > >> >>> >> >> > > >>> > > >> >>> >> >> > > >>> >> Hey Shekar, > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> This is feasible, and you are on the right thought > >> >>> >>process. > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> For the sake of discussion, I'm going to pretend > >>that > >> >>>you > >> >>> >> >>have a > >> >>> >> >> > > >>>Kafka > >> >>> >> >> > > >>> >> topic called "PageViewEvent", which has just the IP > >> >>> >>address > >> >>> >> >>that > >> >>> >> >> > was > >> >>> >> >> > > >>> >>used > >> >>> >> >> > > >>> >> to view a page. These messages will be logged every > >> >>>time > >> >>> a > >> >>> >> >>page > >> >>> >> >> > view > >> >>> >> >> > > >>> >> happens. I'm also going to pretend that you have > >>some > >> >>> >>state > >> >>> >> >> called > >> >>> >> >> > > >>> >>"IPGeo" > >> >>> >> >> > > >>> >> (e.g. The maxmind data set). In this example, we'll > >> >>>want > >> >>> >>to > >> >>> >> >>join > >> >>> >> >> > the > >> >>> >> >> > > >>> >> long/lat geo information from IPGeo to the > >> >>>PageViewEvent, > >> >>> >>and > >> >>> >> >> send > >> >>> >> >> > > >>>it > >> >>> >> >> > > >>> >>to a > >> >>> >> >> > > >>> >> new topic: PageViewEventsWithGeo. > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> You have several options on how to implement this > >> >>> example. > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> 1. If your joining data set (IPGeo) is relatively > >> >>>small > >> >>> >>and > >> >>> >> >> > changes > >> >>> >> >> > > >>> >> infrequently, you can just pack it up in your jar > >>or > >> >>>.tgz > >> >>> >> >>file, > >> >>> >> >> > and > >> >>> >> >> > > >>> open > >> >>> >> >> > > >>> >> it open in every StreamTask. > >> >>> >> >> > > >>> >> 2. If your data set is small, but changes somewhat > >> >>> >> >>frequently, > >> >>> >> >> you > >> >>> >> >> > > >>>can > >> >>> >> >> > > >>> >> throw the data set on some HTTP/HDFS/S3 server > >> >>>somewhere, > >> >>> >>and > >> >>> >> >> have > >> >>> >> >> > > >>>your > >> >>> >> >> > > >>> >> StreamTask refresh it periodically by > >>re-downloading > >> >>>it. > >> >>> >> >> > > >>> >> 3. You can do remote RPC calls for the IPGeo data > >>on > >> >>> every > >> >>> >> >>page > >> >>> >> >> > view > >> >>> >> >> > > >>> >>event > >> >>> >> >> > > >>> >> by query some remote service or DB (e.g. > >>Cassandra). > >> >>> >> >> > > >>> >> 4. You can use Samza's state feature to set your > >>IPGeo > >> >>> >>data > >> >>> >> >>as a > >> >>> >> >> > > >>>series > >> >>> >> >> > > >>> >>of > >> >>> >> >> > > >>> >> messages to a log-compacted Kafka topic > >> >>> >> >> > > >>> >> ( > >> >>> >> >> > >>https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction > >> >>> >> >> > ), > >> >>> >> >> > > >>> and > >> >>> >> >> > > >>> >> configure your Samza job to read this topic as a > >> >>> bootstrap > >> >>> >> >> stream > >> >>> >> >> > > >>> >> ( > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> > >> >>> >> >> > > > >> >>> >> >> > >> >>> >> > >> >>> > >> >>> > >> > http://samza.incubator.apache.org/learn/documentation/0.7.0/container/st > >> >>> >> >> > > >>>r > >> >>> >> >> > > >>> >>e > >> >>> >> >> > > >>> >> ams.html). > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> For (4), you'd have to partition the IPGeo state > >>topic > >> >>> >> >>according > >> >>> >> >> > to > >> >>> >> >> > > >>>the > >> >>> >> >> > > >>> >> same key as PageViewEvent. If PageViewEvent were > >> >>> >>partitioned > >> >>> >> >>by, > >> >>> >> >> > > >>>say, > >> >>> >> >> > > >>> >> member ID, but you want your IPGeo state topic to > >>be > >> >>> >> >>partitioned > >> >>> >> >> > by > >> >>> >> >> > > >>>IP > >> >>> >> >> > > >>> >> address, then you'd have to have an upstream job > >>that > >> >>> >> >> > re-partitioned > >> >>> >> >> > > >>> >> PageViewEvent into some new topic by IP address. > >>This > >> >>>new > >> >>> >> >>topic > >> >>> >> >> > will > >> >>> >> >> > > >>> >>have > >> >>> >> >> > > >>> >> to have the same number of partitions as the IPGeo > >> >>>state > >> >>> >> >>topic > >> >>> >> >> (if > >> >>> >> >> > > >>> IPGeo > >> >>> >> >> > > >>> >> has 8 partitions, then the new > >> >>>PageViewEventRepartitioned > >> >>> >> >>topic > >> >>> >> >> > > >>>needs 8 > >> >>> >> >> > > >>> >>as > >> >>> >> >> > > >>> >> well). This will cause your > >>PageViewEventRepartitioned > >> >>> >>topic > >> >>> >> >>and > >> >>> >> >> > > >>>your > >> >>> >> >> > > >>> >> IPGeo state topic to be aligned such that the > >> >>>StreamTask > >> >>> >>that > >> >>> >> >> gets > >> >>> >> >> > > >>>page > >> >>> >> >> > > >>> >> views for IP address X will also have the IPGeo > >> >>> >>information > >> >>> >> >>for > >> >>> >> >> IP > >> >>> >> >> > > >>> >>address > >> >>> >> >> > > >>> >> X. > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> Which strategy you pick is really up to you. :) > >>(4) is > >> >>> the > >> >>> >> >>most > >> >>> >> >> > > >>> >> complicated, but also the most flexible, and most > >> >>> >> >>operationally > >> >>> >> >> > > >>>sound. > >> >>> >> >> > > >>> >>(1) > >> >>> >> >> > > >>> >> is the easiest if it fits your needs. > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> Cheers, > >> >>> >> >> > > >>> >> Chris > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> On 8/21/14 10:15 AM, "Shekar Tippur" > >> >>><ctip...@gmail.com> > >> >>> >> >>wrote: > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> >Hello, > >> >>> >> >> > > >>> >> > > >> >>> >> >> > > >>> >> >I am new to Samza. I have just installed Hello > >>Samza > >> >>>and > >> >>> >> >>got it > >> >>> >> >> > > >>> >>working. > >> >>> >> >> > > >>> >> > > >> >>> >> >> > > >>> >> >Here is the use case for which I am trying to use > >> >>>Samza: > >> >>> >> >> > > >>> >> > > >> >>> >> >> > > >>> >> > > >> >>> >> >> > > >>> >> >1. Cache the contextual information which contains > >> >>>more > >> >>> >> >> > information > >> >>> >> >> > > >>> >>about > >> >>> >> >> > > >>> >> >the hostname or IP address using Samza/Yarn/Kafka > >> >>> >> >> > > >>> >> >2. Collect alert and metric events which contain > >> >>>either > >> >>> >> >> hostname > >> >>> >> >> > > >>>or IP > >> >>> >> >> > > >>> >> >address > >> >>> >> >> > > >>> >> >3. Append contextual information to the alert and > >> >>>metric > >> >>> >>and > >> >>> >> >> > > >>>insert to > >> >>> >> >> > > >>> >>a > >> >>> >> >> > > >>> >> >Kafka queue from which other subscribers read off > >>of. > >> >>> >> >> > > >>> >> > > >> >>> >> >> > > >>> >> >Can you please shed some light on > >> >>> >> >> > > >>> >> > > >> >>> >> >> > > >>> >> >1. Is this feasible? > >> >>> >> >> > > >>> >> >2. Am I on the right thought process > >> >>> >> >> > > >>> >> >3. How do I start > >> >>> >> >> > > >>> >> > > >> >>> >> >> > > >>> >> >I now have 1 & 2 of them working disparately. I > >>need > >> >>>to > >> >>> >> >> integrate > >> >>> >> >> > > >>> them. > >> >>> >> >> > > >>> >> > > >> >>> >> >> > > >>> >> >Appreciate any input. > >> >>> >> >> > > >>> >> > > >> >>> >> >> > > >>> >> >- Shekar > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> >> > >> >>> >> >> > > >>> > >> >>> >> >> > > >>> > >> >>> >> >> > > >> > >> >>> >> >> > > > >> >>> >> >> > > > >> >>> >> >> > > >> >>> >> >> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> >> > >> > >> > >