[jira] [Created] (ARROW-1626) Add make targets to run the inter-procedural static analysis tool called "infer".

2017-09-29 Thread Rene Sugar (JIRA)
Rene Sugar created ARROW-1626:
-

 Summary: Add make targets to run the inter-procedural static 
analysis tool called "infer".
 Key: ARROW-1626
 URL: https://issues.apache.org/jira/browse/ARROW-1626
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rene Sugar
 Attachments: report.txt

Add make targets to run the inter-procedural static analysis tool called 
"infer".

I have attached the output of running infer.

https://github.com/facebook/infer

http://fbinfer.com/docs/getting-started.html

http://fbinfer.com/docs/steps-for-ci.html

http://fbinfer.com/docs/advanced-features.html

http://fbinfer.com/docs/infer-bug-types.html


1) Build the project with Clang to create a compilation database.

2) Run infer's capture step

make infer
Scanning dependencies of target infer
Capturing using compilation database...
Starting translating 66 files 

3) Run infer's analyze step. This can take a long time.

make infer-analyze
Scanning dependencies of target infer-analyze
Found 66 source files to analyze in 
/Users/rene/projects/arrow/cpp/debug/infer-out
Starting analysis...

legend:
  "F" analyzing a file
  "." analyzing a procedure

4) Run infer's report step.

make infer-report
Scanning dependencies of target infer-report




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1625) [Serialization] Support OrderedDict properly

2017-09-29 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-1625:
-

 Summary: [Serialization] Support OrderedDict properly
 Key: ARROW-1625
 URL: https://issues.apache.org/jira/browse/ARROW-1625
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Philipp Moritz


At the moment what happens when we serialize an OrderedDict and then 
deserialize it, it will become a normal dict! This can be reproduced with

{code}
import pyarrow
import collections
d = collections.OrderedDict([("hello", 1), ("world", 2)])
type(pyarrow.serialize(d).deserialize))
{code}

which will return "dict". See also 
https://github.com/ray-project/ray/issues/1034.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1624) [C++] Follow up fixes / tweaks to compiler warnings for Plasma / LLVM 4.0, add to readme

2017-09-29 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1624:
---

 Summary: [C++] Follow up fixes / tweaks to compiler warnings for 
Plasma / LLVM 4.0, add to readme
 Key: ARROW-1624
 URL: https://issues.apache.org/jira/browse/ARROW-1624
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.8.0


Not sure why these failures weren't caught in CI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Inconsistent naming of list column type in parquet (appending .list.element to column name) ?

2017-09-29 Thread Wes McKinney
It looks like some images are missing from your post. Can you update to
0.7.0?

On Fri, Sep 29, 2017 at 3:19 PM, Abdul Rahman 
wrote:

> I have a parquet file which has a column of type list. When loading the
> particular column from the file by name, it gives me an error specifying
> invalid key error
>
> A deeper look in ColumnSchema of the particular column revealed that the
> column name is listed as .list.element ? Is this a bug
>
> Here's the schema of the file
>
>
>
>
> Here's the ColumnSchema
>
>
>
> If i read the column like this, it works
>
>
>
> I am using version 0.4.0. Is this solved in later releases ?
>


Inconsistent naming of list column type in parquet (appending .list.element to column name) ?

2017-09-29 Thread Abdul Rahman
I have a parquet file which has a column of type list. When loading the 
particular column from the file by name, it gives me an error specifying 
invalid key error

A deeper look in ColumnSchema of the particular column revealed that the column 
name is listed as .list.element ? Is this a bug

Here's the schema of the file

[cid:23cea9e1-f012-4cc6-b404-deb8084529eb]
[cid:dce433ae-48ab-42c8-ac83-6b7ddd830f60]

Here's the ColumnSchema

[cid:d2ae2101-9d38-43ff-952b-209a7e62d450]

If i read the column like this, it works

[cid:41f40ad7-207b-481f-81ab-1fc3648ed38b]

I am using version 0.4.0. Is this solved in later releases ?


Re: [VOTE] Release Apache Arrow 0.7.1 - RC1

2017-09-29 Thread Kouhei Sutou
Hi,

In <1506608524.3455791.1121326136.2abdb...@webmail.messagingengine.com>
  "Re: [VOTE] Release Apache Arrow 0.7.1 - RC1" on Thu, 28 Sep 2017 16:22:04 
+0200,
  "Uwe L. Korn"  wrote:

> Could not verify c_glib due to "in `': uninitialized constant GI
> (NameError)" in the tests. But I guess this is due to a not correct
> setup environment.

Yes. It shows that "gem install gobject-introspection" is
missing. It's needed just for test. c_glib tests aren't run
but c_glib build is finished successfully.


Thanks,
--
kou


Re: [VOTE] Release Apache Arrow 0.7.1 - RC1

2017-09-29 Thread Kouhei Sutou
+1 (binding)

I used dev/release/verify-release-candidate.sh and

* Verified signature, checksum on Debian GNU/Linux sid
* Ran C++ unit tests
* Ran Python unit tests
* Ran C unit tests


Thanks,
--
kou

In 
  "[VOTE] Release Apache Arrow 0.7.1 - RC1" on Wed, 27 Sep 2017 10:01:33 -0400,
  Wes McKinney  wrote:

> Hello all,
> 
> I'd like to propose the 2nd release candidate (rc1) of Apache
> Arrow version 0.7.1.  This is a bugfix release from 0.7.0. The only
> difference between rc1 and rc0 was fixing an issue with the source
> release build for Windows users.
> 
> The source release rc1 is hosted at [1].
> 
> This release candidate is based on commit
> 0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [2]
> 
> The changelog is located at [3].
> 
> Please download, verify checksums and signatures, run the unit
> tests, and vote on the release. Consider using the release
> verification scripts in [4].
> 
> The vote will be open for at least 72 hours.
> 
> [ ] +1 Release this as Apache Arrow 0.7.1
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow 0.7.1 because...
> 
> Thanks,
> Wes
> 
> How to validate a release signature:
> https://httpd.apache.org/dev/verification.html
> 
> [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.7.1-rc1/
> [2]: 
> https://github.com/apache/arrow/tree/0e21f84c2fc26dba949a03ee7d7ebfade0a65b81
> [3]: 
> https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_plain;f=CHANGELOG.md;hb=0e21f84c2fc26dba949a03ee7d7ebfade0a65b81
> [4]: https://github.com/apache/arrow/tree/master/dev/release


[jira] [Created] (ARROW-1623) [C++] Add convenience method to construct Buffer from a string that owns its memory

2017-09-29 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1623:
---

 Summary: [C++] Add convenience method to construct Buffer from a 
string that owns its memory
 Key: ARROW-1623
 URL: https://issues.apache.org/jira/browse/ARROW-1623
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.8.0


The memory would need to be allocated from a memory pool / buffer allocator



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Reminder: Arrow sync call tomorrow

2017-09-29 Thread Wes McKinney
Sorry for the delay with the meeting notes. The next sync call will be
on Wednesday 10/4 at 16:00 UTC. I need help setting up a recurring
meeting invite with Google Meet (or equivalent) so that we can
accommodate more than 10 participants. @Jacques can you help?

Here is a summary of topics

- Wes (Two Sigma)
  - 0.8.0 Roadmap
- Li Jin (Two Sigma)
  - ARROW-1463
- Jacques (Dremio)
  - ARROW-1463
- Heimir (Mojotech)
  - Documentation automation
- Siddharth (Dremio)
  - Java development
  - ARROW-1463 - to send out requirements docs
- Max Grossman (Houston)
- Tom Augspurger (Anaconda)
  - Arrow + MapD + Python API
- Julien (Berkeley)
- Uwe (Blue Yonder)
  - Packaging for Linux / linking problems
  - Decimals
- Phillip (Two Sigma)
- Bryan Cutler (IBM)

- 0.7.0
  - Release verification script
- 0.8.0
  - ARROW-352 - Interval type feedback
- PARQUET-675
- Java: IntervalDay compound type with days and milliseconds
- Accommodating nanoseconds
- How to handle user intent
  - Continue to harden logical types

- Decimals
  - Drill had 4 decimals
  - Type promotions happen early in type tree, benefits of smaller storage
  - Create JIRA about hardening details in schema documents
  - Is big-endian the best in-memory format for decimals?
- Need to do some performance experiments
  - Phillip C to investigate

- ARROW-1463
  - Moving code out of templates
  - Siddharth to send out requirements

On Wed, Sep 20, 2017 at 11:38 AM, Bill Kehoe  wrote:
> I dropped out. Try again.
>
> Bill Kehoe
> Big Data Architect
> Email:  bill.ke...@qlik.com
>
> qlik.com 
>
>
> On 9/20/17, 12:36 PM, "Wes McKinney"  wrote:
>
> The call is full, trying to call back in
>
> On Wed, Sep 20, 2017 at 11:37 AM, Wes McKinney  
> wrote:
> > The sync call will be on
> > 
> https://plus.google.com/hangouts/_/calendar/d2VzbWNraW5uQGdtYWlsLmNvbQ.6mi6ang5bq3ackfm612l4m01c4?authuser=0
> > in a little bit
> >
> > On Tue, Sep 19, 2017 at 6:24 PM, Wes McKinney  
> wrote:
> >> Sync call tomorrow (9/20) @ 16:00 UTC to discuss outstanding issues
> >> and anything related to the next release.
> >>
> >> I will post a Google Hangout link here tomorrow before the call, and
> >> post notes/minutes on the mailing list so that those who are unable to
> >> join can participate in follow up discussions.
>
>


Re: [VOTE] Release Apache Arrow 0.7.1 - RC1

2017-09-29 Thread Wes McKinney
Replying to Kou's comment:

I am not 100% sure -- one way would be to add instructions to

https://github.com/apache/arrow/blob/master/dev/release/VERIFY.md

That way at least people can be explicit about what modifications
they're making to their system. We could also create a script that
creates a standalone directory with Ruby and the requisite system
dependencies (GLib, gobject-instrospection), etc., so that you can
verify the release without making any system modifications

On Wed, Sep 27, 2017 at 8:42 PM, Kouhei Sutou  wrote:
> Hi,
>
> In