Podling Report Reminder - January 2017
Dear podling, This email was sent by an automated system on behalf of the Apache Incubator PMC. It is an initial reminder to give you plenty of time to prepare your quarterly board report. The board meeting is scheduled for Wed, 18 January 2017, 10:30 am PDT. The report for your podling will form a part of the Incubator PMC report. The Incubator PMC requires your report to be submitted 2 weeks before the board meeting, to allow sufficient time for review and submission (Wed, January 04). Please submit your report with sufficient time to allow the Incubator PMC, and subsequently board members to review and digest. Again, the very latest you should submit your report is 2 weeks prior to the board meeting. Thanks, The Apache Incubator PMC Submitting your Report -- Your report should contain the following: * Your project name * A brief description of your project, which assumes no knowledge of the project or necessarily of its field * A list of the three most important issues to address in the move towards graduation. * Any issues that the Incubator PMC or ASF Board might wish/need to be aware of * How has the community developed since the last report * How has the project developed since the last report. This should be appended to the Incubator Wiki page at: https://wiki.apache.org/incubator/January2017 Note: This is manually populated. You may need to wait a little before this page is created from a template. Mentors --- Mentors should review reports for their project(s) and sign them off on the Incubator wiki page. Signing off reports shows that you are following the project - projects that are not signed may raise alarms for the Incubator PMC. Incubator PMC
[jira] [Commented] (DATAFU-119) New UDF - TupleDiff
[ https://issues.apache.org/jira/browse/DATAFU-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793097#comment-15793097 ] Eyal Allweil commented on DATAFU-119: - If we add DATAFU-123, we can include the macro I put in the description so that people can use it instead of duplicating it in order to conveniently call the UDF. > New UDF - TupleDiff > --- > > Key: DATAFU-119 > URL: https://issues.apache.org/jira/browse/DATAFU-119 > Project: DataFu > Issue Type: New Feature >Reporter: Eyal Allweil >Assignee: Eyal Allweil > > A UDF that given two tuples, prints out the differences between them in > human-readable form. This is not meant for production - we use it in PayPal > for regression tests, to compare the results of two runs. Differences are > calculated based on position, but the tuples' schemas are used, if available, > for displaying more friendly results. If no schema is available the output > uses field numbers. > It should be used when you want a more fine-grained description of what has > changed, unlike > [org.apache.pig.builtin.DIFF|https://pig.apache.org/docs/r0.14.0/func.html#diff]. > Also, because DIFF takes as its input two bags to be compared, they must fit > in memory. This UDF only takes one pair of tuples at a time, so it can run on > large inputs. > We use a macro much like the following in conjunction with this UDF: > {noformat} > DEFINE diff_macro(diff_macro_old, diff_macro_new, diff_macro_pk, > diff_macro_ignored_field) returns diffs { > DEFINE TupleDiff datafu.pig.util.TupleDiff; > > old = FOREACH $diff_macro_old GENERATE $diff_macro_pk, TOTUPLE(*) AS > original; > new = FOREACH $diff_macro_new GENERATE $diff_macro_pk, TOTUPLE(*) AS > original; > > join_data = JOIN new BY $diff_macro_pk full, old BY $diff_macro_pk; > > join_data = FOREACH join_data GENERATE TupleDiff(old::original, > new::original, '$diff_macro_ignored_field') AS tupleDiff, old::original, > new::original; > > $diffs = FILTER join_data BY tupleDiff IS NOT NULL ; > }; > {noformat} > Currently, the output from the macro looks like this (when comma-separated): > {noformat} > added,, > missing,, > changed field2 field4,, > {noformat} > The UDF takes a variable number of parameters - the two tuples to be > compared, and any number of field names or numbers to be ignored. We use this > to ignore fields representing execution or creation time (the macro I've > given as an example assumes only one ignored field) > The current implementation "drills down" into tuples, but not bags or maps - > tuple boundaries are indicated with parentheses, like this: > {noformat} > changed outerEmbeddedTuple(innerEmbeddedTuple(fieldNameThatIsDifferent) > innerEmbeddedTuple(anotherFieldThatIsDifferent)) > {noformat} > I have a few final things left to do and then I'll put it up on reviewboard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 55110: DATAFU-106 Test files are currently created in the subdirectory folder (e.g. datafu-pig/input*). For better organization, they should be created in a subdirectory.
> On Jan. 2, 2017, 9:29 a.m., Eyal Allweil wrote: > > Hi Piyush, > > > > Thank you for your patch! It looks to me that it works fine - I ran our > > tests on Ubuntu, both from Eclipse and from the command line. > > > > I have two comments, one "real" and one just a typo: > > > > 1) From [this question](http://stackoverflow.com/a/840229/150992), it > > appears that changing the *user.dir* property in this way can be > > unpredictable - that it won't affect FileOutputStreams, for example. So > > although this works for our current tests (the FileOutputStream used in > > datafu.pig.linkanalysis.PageRank uses *File.createTempFile* instead of the > > working dir, so it's fine) I worry that this might not work in the future. > > Maybe add a comment about this in the *beforeClass* method? > > > > 2) "Location" is spelled as "loaction" in PigTests.java > > > > Cheers, > > Eyal Hi Eyal, Thank you for reviewing it. 1) I've actually tried doing it without changing user.dir. The pigtests don't actually work that way (for eg: LOAD 'input'... searches for the file in the working directory and it apparently deciphers it to be the same as that of the jvm). Hence, this was the only possible way to solve the issue that I could think of. I also went through that stackoverflow question whilst creating the patch to resolve my suspicions about the same. Actually, as even the accepted answer says it does affect the subsequent file creations, which happens via a call to writeLinesToFile defined in the PigTests.java which always creates a new file, deleting the existing one if necessary, and then writes to it. Now since writeLinesToFile also implicitly uses FileOutputStream (via a FileWriter object) to write to files just created in the new user.dir, it seemed like changing the working directory did actually affect the FileOutPutStreams, atleast as far as our project goes it did. Moreover the afterMethod resets the user.dir so hopefully there aren't any consequences at all. I can add some comment like "//Only some classes reflect the changes in working directory" or something like "//Developers are requested to just work with absoluteFiles and absolutePaths to avoid IOExceptions and such" but I feel it throws up a red flag unnecessarily, so should I ? 2) I will surely correct the typo, thank you. Cheerio, Piyush - Piyush --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55110/#review160317 --- On Dec. 31, 2016, 10:47 a.m., Piyush Sharma wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55110/ > --- > > (Updated Dec. 31, 2016, 10:47 a.m.) > > > Review request for DataFu. > > > Repository: datafu > > > Description > --- > > DATAFU-106: Test files are currently created in the subdirectory folder (e.g. > datafu-pig/input*). For better organization, they should be created in a > subdirectory. This also makes it easier to exclude them all with gitignore. > (issue: https://issues.apache.org/jira/browse/DATAFU-106) > > > Diffs > - > > datafu-pig/src/test/java/datafu/test/pig/PigTests.java d1d6fcc > > datafu-pig/src/test/java/datafu/test/pig/test_filesSubdir/TestFilesSubdirTest.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/55110/diff/ > > > Testing > --- > > unit tests passed. > > > Thanks, > > Piyush Sharma > >