Re: [Pharo-users] GSOC 2015 Call for Ideas

Sven Van Caekenberghe Wed, 18 Feb 2015 02:02:21 -0800

OK, try making a proposal then, http://gsoc.pharo.org has the instructions and 
the current list, you probably know more about data science than I do.


> On 18 Feb 2015, at 10:53, Andrea Ferretti <ferrettiand...@gmail.com> wrote:
> 
> I am sorry if the previous messages came off as too harsh. The Neo
> tools are perfectly fine for their intended use.
> 
> What I was trying to say is that a good idea for a SoC project would
> be to develop a framework for data analysis that would be useful for
> data scientists, and in particular this would include something to
> import unstructured data more freely.
> 
> 2015-02-18 10:39 GMT+01:00 Sven Van Caekenberghe <s...@stfx.eu>:
>> Well, you are certainly free to contribute.
>> 
>> Heuristic interpretation of data could be useful, but looks like an addition 
>> on top, the core library should be fast and efficient.
>> 
>>> On 18 Feb 2015, at 10:35, Andrea Ferretti <ferrettiand...@gmail.com> wrote:
>>> 
>>> For an example of what I am talking about, see
>>> 
>>> http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
>>> 
>>> I agree that this is definitely too much options, but it gets the job
>>> done for quick and dirty exploration.
>>> 
>>> The fact is that working with a dump of table on your db, whose
>>> content you know, requires different tools than exploring the latest
>>> opendata that your local municipality has put online, using yet
>>> another messy format.
>>> 
>>> Enterprise programmers deal more often with the former, data
>>> scientists with the latter, and I think there is room for both kind of
>>> tools
>>> 
>>> 2015-02-18 10:26 GMT+01:00 Andrea Ferretti <ferrettiand...@gmail.com>:
>>>> Thank you Sven. I think this should be emphasized and prominent on the
>>>> home page*. Still, libraries such as pandas are even more lenient,
>>>> doing things such as:
>>>> 
>>>> - autodetecting which fields are numeric in CSV files
>>>> - allowing to fill missing data based on statistics (for instance, you
>>>> can say: where the field `age` is missing, use the average age)
>>>> 
>>>> Probably there is room for something built on top of Neo
>>>> 
>>>> 
>>>> * by the way, I suggest that the documentation on Neo could benefit
>>>> from a reorganization. Right now, the first topic  on the NeoJSON
>>>> paper introduces JSON itself. I would argue that everyone that tries
>>>> to use the library knows what JSON is already. Still, there is no
>>>> example of how to read JSON from a file in the whole document.
>>>> 
>>>> 2015-02-18 10:12 GMT+01:00 Sven Van Caekenberghe <s...@stfx.eu>:
>>>>> 
>>>>>> On 18 Feb 2015, at 09:52, Andrea Ferretti <ferrettiand...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Also, these tasks
>>>>>> often involve consuming data from various sources, such as CSV and
>>>>>> Json files. NeoCSV and NeoJSON are still a little too rigid for the
>>>>>> task - libraries like pandas allow to just feed a csv file and try to
>>>>>> make head or tails of the content without having to define too much of
>>>>>> a schema beforehand
>>>>> 
>>>>> Both NeoCSV and NeoJSON can operate in two ways, (1) without the 
>>>>> definition of any schema's or (2) with the definition of schema's and 
>>>>> mappings. The quick and dirty explore style is most certainly possible.
>>>>> 
>>>>> 'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) 
>>>>> upToEnd ].
>>>>> 
>>>>> => an array of arrays
>>>>> 
>>>>> 'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: 
>>>>> in) next ].
>>>>> 
>>>>> => objects structured using dictionaries and arrays
>>>>> 
>>>>> Sven
>>>>> 
>>>>> 
>>> 
>> 
>> 
>

Re: [Pharo-users] GSOC 2015 Call for Ideas

Reply via email to