I think Raman knows where to look for the test case(s) for AQL UDFs? (The answer to question 2 is presumably Yes.)
Chen On Thu, Oct 29, 2015 at 12:22 PM, Jianfeng Jia <[email protected]> wrote: > Hi Devs, > > I have two related questions, > 1. Is there any example code of using UDF in feed-adapter? > 2. Can we use AQL function in those kind of feed UDFs? > > Thank you. > > On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <[email protected]> > wrote: > >> Thanks! >> >> On 10/27/15 9:48 AM, Raman Grover wrote: >> >>> Hi, >>> >>> >>> In the case when data is being received from an external source (e.g. >>> during feed ingestion), a slow rate of arrival of data may result in >>> excessive delays until the data is deposited into the target dataset and >>> made accessible to queries. Data moves along a data ingestion pipeline >>> between operators as packed fixed size frames. The default behavior is to >>> wait for the frame to be full before dispatching the contained data to the >>> downstream operator. However, as noted, this may not suit all scenarios >>> particularly when data source is sending data at a low rate. To cater to >>> different scenarios, AsterixDB allows configuring the behavior. The >>> different options are described next. >>> >>> *Push data downstream when* >>> (a) Frame is full (default) >>> (b) At least N records (data items) have been collected into a partially >>> filled frame >>> (c) At least T seconds have elapsed since the last record was put into >>> the frame >>> >>> *How to configure the behavior?* >>> At the time of defining a feed, an end-user may specify configuration >>> parameters that determine the runtime behavior (options (a), (b) or (c) >>> from above). >>> >>> The parameters are described below: >>> >>> /"parser-policy"/: A specific strategy chosen from a set of pre-defined >>> values - >>> (i) / "frame_full"/ >>> This is the default value. As the name suggests, this choice causes >>> frames to be pushed by the feed adaptor only when there isn't sufficient >>> space for an additional record to fit in. This corresponds to option (a). >>> >>> (ii) / "counter_timer_expired" / >>> Use this as the value if you wish to set either option (b) or (c) or a >>> combination of both. >>> >>> *Some Examples* >>> * >>> * >>> 1) Pack a maximum of 100 records into a data frame and push it >>> downstream. >>> >>> create feed my_feed using my_adaptor >>> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ... >>> other parameters); >>> >>> 2) Wait till 2 seconds and send however many records collected in a >>> frame downstream. >>> create feed my_feed using my_adaptor >>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")... >>> other parameters); >>> >>> 3) Wait till 100 records have been collected into a data frame or 2 >>> seconds have elapsed since the last record was put into the current data >>> frame. >>> create feed my_feed using my_adaptor >>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"), >>> ("batch-size"="100"),... other parameters); >>> >>> >>> *Note* >>> The above config parameters are not specific to using a particular >>> implementation of an adaptor but are available for use with any feed >>> adaptor. Some adaptors that ship with AsterixDB use different default >>> values for above to suit their specific scenario. E.g. the pull-based >>> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and >>> sets the parameter "batch-interval". >>> >>> >>> Regards, >>> Raman >>> PS: The names of the parameters described above are not as intuitive as >>> one would like them to be. The names need to be changed. >>> >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <[email protected] <mailto: >>> [email protected]>> wrote: >>> >>> I think we need to have tuning parameters - like batch size and >>> maximum tolerable latency (in case there's a lull and you still >>> want to push stuff with some worst-case delay). @Raman Grover - >>> remind me (us) what's available in this regard? >>> >>> On 10/22/15 4:29 AM, Pääkkönen Pekka wrote: >>> >>>> >>>> Hi, >>>> >>>> Yes, you are right. I tried sending a larger amount of data, and >>>> data is now stored to the database. >>>> >>>> Does it make sense to configure a smaller batch size in order to >>>> get more frequent writes? >>>> >>>> Or would it significantly impact performance? >>>> >>>> -Pekka >>>> >>>> Data moves through the pipeline in frame-sized batches, so one >>>> >>>> (uniformed :-)) guess is that you aren't running very long, and >>>> you're >>>> >>>> only seeing the data flow when you close because only then do you >>>> have a >>>> >>>> batch's worth. Is that possible? You can test this by running >>>> longer >>>> >>>> (more data) and seeing if you start to see the expected incremental >>>> >>>> flow/inserts. (And we need tunability in this area, e.g., >>>> parameters on >>>> >>>> how much batching and/or low much latency to tolerate on each feed.) >>>> >>>> On 10/21/15 4:45 AM, Pääkkönen Pekka wrote: >>>> >>>> > >>>> >>>> > Hi, >>>> >>>> > >>>> >>>> > Thanks, now I am able to create a socket feed, and save items to >>>> the >>>> >>>> > dataset from the feed. >>>> >>>> > >>>> >>>> > It seems that data items are written to the dataset after I close >>>> the >>>> >>>> > socket at the client. >>>> >>>> > >>>> >>>> > Is there some way to indicate to AsterixDB feed (with a newline or >>>> >>>> > other indicator) that data can be written to the database, when >>>> the >>>> >>>> > connection is open? >>>> >>>> > >>>> >>>> > After I close the socket at the client, the feed seems to close >>>> down. >>>> >>>> > Or is it only paused, until it is resumed? >>>> >>>> > >>>> >>>> > -Pekka >>>> >>>> > >>>> >>>> > Hi Pekka, >>>> >>>> > >>>> >>>> > That's interesting, I'm not sure why the CC would appear as being >>>> down >>>> >>>> > >>>> >>>> > to Managix. However if you can access the web console, it that >>>> >>>> > >>>> >>>> > evidently isn't the case. >>>> >>>> > >>>> >>>> > As for data ingestion via sockets, yes it is possible, but it >>>> kind of >>>> >>>> > >>>> >>>> > depends on what's meant by sockets. There's no tutorial for it, >>>> but >>>> >>>> > >>>> >>>> > take a look at SocketBasedFeedAdapter in the source, as well as >>>> >>>> > >>>> >>>> > >>>> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java >>>> >>>> > >>>> >>>> > for some examples of how it works. >>>> >>>> > >>>> >>>> > Hope that helps! >>>> >>>> > >>>> >>>> > Thanks, >>>> >>>> > >>>> >>>> > -Ian >>>> >>>> > >>>> >>>> > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka >>>> >>>> ><[email protected]> <mailto:[email protected]> wrote: >>>> >>>> > > Hi Ian, >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > Thanks for the reply. >>>> >>>> > > >>>> >>>> > > I compiled AsterixDB v0.87 and started it. >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > However, I get the following warnings: >>>> >>>> > > >>>> >>>> > > INFO: Name:my_asterix >>>> >>>> > > >>>> >>>> > > Created:Mon Oct 19 08:37:16 UTC 2015 >>>> >>>> > > >>>> >>>> > > Web-Url:http://192.168.101.144:19001 >>>> >>>> > > >>>> >>>> > > State:UNUSABLE >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > WARNING!:Cluster Controller not running at master >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > Also, I see the following warnings in my_asterixdb1.log. there >>>> are no >>>> >>>> > > warnings or errors in cc.log >>>> >>>> > > >>>> >>>> > > “ >>>> >>>> > > >>>> >>>> > > Oct 19, 2015 8:37:39 AM >>>> >>>> > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager >>>> configure >>>> >>>> > > >>>> >>>> > > SEVERE: LifecycleComponentManager configured >>>> >>>> > > >>>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47 >>>> >>>> > > >>>> >>>> > > .. >>>> >>>> > > >>>> >>>> > > INFO: Completed sharp checkpoint. >>>> >>>> > > >>>> >>>> > > Oct 19, 2015 8:37:40 AM >>>> org.apache.asterix.om.util.AsterixClusterProperties >>>> >>>> > > getIODevices >>>> >>>> > > >>>> >>>> > > WARNING: Configuration parameters for nodeId my_asterix_node1 >>>> not found. The >>>> >>>> > > node has not joined yet or has left. >>>> >>>> > > >>>> >>>> > > Oct 19, 2015 8:37:40 AM >>>> org.apache.asterix.om.util.AsterixClusterProperties >>>> >>>> > > getIODevices >>>> >>>> > > >>>> >>>> > > WARNING: Configuration parameters for nodeId my_asterix_node1 >>>> not found. The >>>> >>>> > > node has not joined yet or has left. >>>> >>>> > > >>>> >>>> > > Oct 19, 2015 8:38:38 AM >>>> >>>> > > org.apache.hyracks.control.common.dataset.ResultStateSweeper >>>> sweep >>>> >>>> > > >>>> >>>> > > INFO: Result state cleanup instance successfully completed.” >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > I seems that AsterixDB is running, and I can access it at port >>>> 19001. >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > The documentation shows ingestion of tweets, but I would be >>>> interested in >>>> >>>> > > using sockets. >>>> >>>> > > >>>> >>>> > > Is it possible to ingest data from sockets? >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > Regards, >>>> >>>> > > >>>> >>>> > > -Pekka >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > Hey there Pekka, >>>> >>>> > > >>>> >>>> > > Your intuition is correct, most of the newer feeds features are >>>> in the >>>> >>>> > > >>>> >>>> > > current master branch and not in the (very) old 0.8.6 release. >>>> If you'd >>>> >>>> > > >>>> >>>> > > like to experiment with them you'll have to build from source. >>>> The >>>> details >>>> >>>> > > >>>> >>>> > > about that are here: >>>> >>>> > > >>>> >>>> > > >>>> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse >>>> >>>> > > >>>> >>>> > > , but they're probably a bit overkill for just trying to get the >>>> compiled >>>> >>>> > > >>>> >>>> > > binaries. For that all you really need to do is : >>>> >>>> > > >>>> >>>> > > - Clone Hyracks from git >>>> >>>> > > >>>> >>>> > > - 'mvn clean install -DskipTests' >>>> >>>> > > >>>> >>>> > > - Clone AsterixDB >>>> >>>> > > >>>> >>>> > > - 'mvn clean package -DskipTests' >>>> >>>> > > >>>> >>>> > > Then, the binaries will sit in asterix-installer/target >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > For an example, the documentation shows how to set up a feed >>>> that's >>>> >>>> > > >>>> >>>> > > ingesting Tweets: >>>> >>>> > > >>>> >>>> > > >>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > Thanks, >>>> >>>> > > >>>> >>>> > > -Ian >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka >>>> <[email protected]> <mailto:[email protected]> >>>> >>>> > > >>>> >>>> > > wrote: >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > >> Hi, >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > >> I would like to experiment with a socket-based feed. >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > >> Can you point me to an example on how to utilize them? >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > >> Do I need to install 0.8.7-snapshot version of AsterixDB in >>>> order to >>>> >>>> > > >>>> >>>> > >> experiment with feeds? >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > >> Regards, >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > >> -Pekka Pääkkönen >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > > >>>> >>>> > >>>> >>>> >>> >>> >>> >>> -- >>> Raman >>> >> >> > > > -- > > ----------------- > Best Regards > > Jianfeng Jia > Ph.D. Candidate of Computer Science > University of California, Irvine >
