Podling Report Reminder - January 2017

2017-01-02 Thread johndament
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 18 January 2017, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, January 04).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.

This should be appended to the Incubator Wiki page at:

https://wiki.apache.org/incubator/January2017

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC


[jira] [Commented] (DATAFU-119) New UDF - TupleDiff

2017-01-02 Thread Eyal Allweil (JIRA)

[ 
https://issues.apache.org/jira/browse/DATAFU-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793097#comment-15793097
 ] 

Eyal Allweil commented on DATAFU-119:
-

If we add DATAFU-123, we can include the macro I put in the description so that 
people can use it instead of duplicating it in order to conveniently call the 
UDF.

> New UDF - TupleDiff
> ---
>
> Key: DATAFU-119
> URL: https://issues.apache.org/jira/browse/DATAFU-119
> Project: DataFu
>  Issue Type: New Feature
>Reporter: Eyal Allweil
>Assignee: Eyal Allweil
>
> A UDF that given two tuples, prints out the differences between them in 
> human-readable form. This is not meant for production - we use it in PayPal 
> for regression tests, to compare the results of two runs. Differences are 
> calculated based on position, but the tuples' schemas are used, if available, 
> for displaying more friendly results. If no schema is available the output 
> uses field numbers.
> It should be used when you want a more fine-grained description of what has 
> changed, unlike 
> [org.apache.pig.builtin.DIFF|https://pig.apache.org/docs/r0.14.0/func.html#diff].
>  Also, because DIFF takes as its input two bags to be compared, they must fit 
> in memory. This UDF only takes one pair of tuples at a time, so it can run on 
> large inputs.
> We use a macro much like the following in conjunction with this UDF:
> {noformat}
> DEFINE diff_macro(diff_macro_old, diff_macro_new, diff_macro_pk, 
> diff_macro_ignored_field) returns diffs {
>   DEFINE TupleDiff datafu.pig.util.TupleDiff;
>   
>   old =   FOREACH $diff_macro_old GENERATE $diff_macro_pk, TOTUPLE(*) AS 
> original;
>   new =   FOREACH $diff_macro_new GENERATE $diff_macro_pk, TOTUPLE(*) AS 
> original;
>   
>   join_data = JOIN new BY $diff_macro_pk full, old BY $diff_macro_pk;
>   
>   join_data = FOREACH join_data GENERATE TupleDiff(old::original, 
> new::original, '$diff_macro_ignored_field') AS tupleDiff, old::original, 
> new::original;
>   
>   $diffs = FILTER join_data BY tupleDiff IS NOT NULL ;
> };
> {noformat}
> Currently, the output from the macro looks like this (when comma-separated):
> {noformat}
> added,,
> missing,,
> changed field2 field4,,
> {noformat}
> The UDF takes a variable number of parameters - the two tuples to be 
> compared, and any number of field names or numbers to be ignored. We use this 
> to ignore fields representing execution or creation time (the macro I've 
> given as an example assumes only one ignored field)
> The current implementation "drills down" into tuples, but not bags or maps - 
> tuple boundaries are indicated with parentheses, like this:
> {noformat}
> changed outerEmbeddedTuple(innerEmbeddedTuple(fieldNameThatIsDifferent) 
> innerEmbeddedTuple(anotherFieldThatIsDifferent))
> {noformat}
> I have a few final things left to do and then I'll put it up on reviewboard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 55110: DATAFU-106 Test files are currently created in the subdirectory folder (e.g. datafu-pig/input*). For better organization, they should be created in a subdirectory.

2017-01-02 Thread Piyush Sharma


> On Jan. 2, 2017, 9:29 a.m., Eyal Allweil wrote:
> > Hi Piyush,
> > 
> > Thank you for your patch! It looks to me that it works fine - I ran our 
> > tests on Ubuntu, both from Eclipse and from the command line.
> > 
> > I have two comments, one "real" and one just a typo:
> > 
> > 1) From [this question](http://stackoverflow.com/a/840229/150992), it 
> > appears that changing the *user.dir* property in this way can be 
> > unpredictable - that it won't affect FileOutputStreams, for example. So 
> > although this works for our current tests (the FileOutputStream used in 
> > datafu.pig.linkanalysis.PageRank uses *File.createTempFile* instead of the 
> > working dir, so it's fine) I worry that this might not work in the future. 
> > Maybe add a comment about this in the *beforeClass* method?
> > 
> > 2) "Location" is spelled as "loaction" in PigTests.java
> > 
> > Cheers,
> > Eyal

Hi Eyal,

Thank you for reviewing it.

1) I've actually tried doing it without changing user.dir. The pigtests don't 
actually work that way (for eg: LOAD 'input'... searches for the file in the 
working directory and it apparently deciphers it to be the same as that of the 
jvm). Hence, this was the only possible way to solve the issue that I could 
think of. 

I also went through that stackoverflow question whilst creating the patch to 
resolve my suspicions about the same. Actually, as even the accepted answer 
says it does affect the subsequent file creations, which happens via a call to 
writeLinesToFile defined in the PigTests.java which always creates a new file, 
deleting the existing one if necessary, and then writes to it. Now since 
writeLinesToFile also implicitly uses FileOutputStream (via a FileWriter 
object) to write to files just created in the new user.dir, it seemed like 
changing the working directory did actually affect the FileOutPutStreams, 
atleast as far as our project goes it did. Moreover the afterMethod resets the 
user.dir so hopefully there aren't any consequences at all. 

I can add some comment like "//Only some classes reflect the changes in working 
directory" or something like "//Developers are requested to just work with 
absoluteFiles and absolutePaths to avoid IOExceptions and such" but I feel it 
throws up a red flag unnecessarily, so should I ?

2) I will surely correct the typo, thank you.

Cheerio,
Piyush


- Piyush


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55110/#review160317
---


On Dec. 31, 2016, 10:47 a.m., Piyush  Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55110/
> ---
> 
> (Updated Dec. 31, 2016, 10:47 a.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> ---
> 
> DATAFU-106: Test files are currently created in the subdirectory folder (e.g. 
> datafu-pig/input*). For better organization, they should be created in a 
> subdirectory. This also makes it easier to exclude them all with gitignore. 
> (issue: https://issues.apache.org/jira/browse/DATAFU-106)
> 
> 
> Diffs
> -
> 
>   datafu-pig/src/test/java/datafu/test/pig/PigTests.java d1d6fcc 
>   
> datafu-pig/src/test/java/datafu/test/pig/test_filesSubdir/TestFilesSubdirTest.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55110/diff/
> 
> 
> Testing
> ---
> 
> unit tests passed.
> 
> 
> Thanks,
> 
> Piyush  Sharma
> 
>