Re: [Pharo-dev] [Pharo-users] GSOC 2015 Call for Ideas

Sven Van Caekenberghe Wed, 18 Feb 2015 01:40:03 -0800

Well, you are certainly free to contribute.

Heuristic interpretation of data could be useful, but looks like an addition on 
top, the core library should be fast and efficient.


> On 18 Feb 2015, at 10:35, Andrea Ferretti <ferrettiand...@gmail.com> wrote:
> 
> For an example of what I am talking about, see
> 
> http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
> 
> I agree that this is definitely too much options, but it gets the job
> done for quick and dirty exploration.
> 
> The fact is that working with a dump of table on your db, whose
> content you know, requires different tools than exploring the latest
> opendata that your local municipality has put online, using yet
> another messy format.
> 
> Enterprise programmers deal more often with the former, data
> scientists with the latter, and I think there is room for both kind of
> tools
> 
> 2015-02-18 10:26 GMT+01:00 Andrea Ferretti <ferrettiand...@gmail.com>:
>> Thank you Sven. I think this should be emphasized and prominent on the
>> home page*. Still, libraries such as pandas are even more lenient,
>> doing things such as:
>> 
>> - autodetecting which fields are numeric in CSV files
>> - allowing to fill missing data based on statistics (for instance, you
>> can say: where the field `age` is missing, use the average age)
>> 
>> Probably there is room for something built on top of Neo
>> 
>> 
>> * by the way, I suggest that the documentation on Neo could benefit
>> from a reorganization. Right now, the first topic  on the NeoJSON
>> paper introduces JSON itself. I would argue that everyone that tries
>> to use the library knows what JSON is already. Still, there is no
>> example of how to read JSON from a file in the whole document.
>> 
>> 2015-02-18 10:12 GMT+01:00 Sven Van Caekenberghe <s...@stfx.eu>:
>>> 
>>>> On 18 Feb 2015, at 09:52, Andrea Ferretti <ferrettiand...@gmail.com> wrote:
>>>> 
>>>> Also, these tasks
>>>> often involve consuming data from various sources, such as CSV and
>>>> Json files. NeoCSV and NeoJSON are still a little too rigid for the
>>>> task - libraries like pandas allow to just feed a csv file and try to
>>>> make head or tails of the content without having to define too much of
>>>> a schema beforehand
>>> 
>>> Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition 
>>> of any schema's or (2) with the definition of schema's and mappings. The 
>>> quick and dirty explore style is most certainly possible.
>>> 
>>> 'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) 
>>> upToEnd ].
>>> 
>>>  => an array of arrays
>>> 
>>> 'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) 
>>> next ].
>>> 
>>>  => objects structured using dictionaries and arrays
>>> 
>>> Sven
>>> 
>>> 
>

Re: [Pharo-dev] [Pharo-users] GSOC 2015 Call for Ideas

Reply via email to