OK, try making a proposal then, http://gsoc.pharo.org has the instructions and the current list, you probably know more about data science than I do.
> On 18 Feb 2015, at 10:53, Andrea Ferretti <ferrettiand...@gmail.com> wrote: > > I am sorry if the previous messages came off as too harsh. The Neo > tools are perfectly fine for their intended use. > > What I was trying to say is that a good idea for a SoC project would > be to develop a framework for data analysis that would be useful for > data scientists, and in particular this would include something to > import unstructured data more freely. > > 2015-02-18 10:39 GMT+01:00 Sven Van Caekenberghe <s...@stfx.eu>: >> Well, you are certainly free to contribute. >> >> Heuristic interpretation of data could be useful, but looks like an addition >> on top, the core library should be fast and efficient. >> >>> On 18 Feb 2015, at 10:35, Andrea Ferretti <ferrettiand...@gmail.com> wrote: >>> >>> For an example of what I am talking about, see >>> >>> http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files >>> >>> I agree that this is definitely too much options, but it gets the job >>> done for quick and dirty exploration. >>> >>> The fact is that working with a dump of table on your db, whose >>> content you know, requires different tools than exploring the latest >>> opendata that your local municipality has put online, using yet >>> another messy format. >>> >>> Enterprise programmers deal more often with the former, data >>> scientists with the latter, and I think there is room for both kind of >>> tools >>> >>> 2015-02-18 10:26 GMT+01:00 Andrea Ferretti <ferrettiand...@gmail.com>: >>>> Thank you Sven. I think this should be emphasized and prominent on the >>>> home page*. Still, libraries such as pandas are even more lenient, >>>> doing things such as: >>>> >>>> - autodetecting which fields are numeric in CSV files >>>> - allowing to fill missing data based on statistics (for instance, you >>>> can say: where the field `age` is missing, use the average age) >>>> >>>> Probably there is room for something built on top of Neo >>>> >>>> >>>> * by the way, I suggest that the documentation on Neo could benefit >>>> from a reorganization. Right now, the first topic on the NeoJSON >>>> paper introduces JSON itself. I would argue that everyone that tries >>>> to use the library knows what JSON is already. Still, there is no >>>> example of how to read JSON from a file in the whole document. >>>> >>>> 2015-02-18 10:12 GMT+01:00 Sven Van Caekenberghe <s...@stfx.eu>: >>>>> >>>>>> On 18 Feb 2015, at 09:52, Andrea Ferretti <ferrettiand...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Also, these tasks >>>>>> often involve consuming data from various sources, such as CSV and >>>>>> Json files. NeoCSV and NeoJSON are still a little too rigid for the >>>>>> task - libraries like pandas allow to just feed a csv file and try to >>>>>> make head or tails of the content without having to define too much of >>>>>> a schema beforehand >>>>> >>>>> Both NeoCSV and NeoJSON can operate in two ways, (1) without the >>>>> definition of any schema's or (2) with the definition of schema's and >>>>> mappings. The quick and dirty explore style is most certainly possible. >>>>> >>>>> 'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) >>>>> upToEnd ]. >>>>> >>>>> => an array of arrays >>>>> >>>>> 'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: >>>>> in) next ]. >>>>> >>>>> => objects structured using dictionaries and arrays >>>>> >>>>> Sven >>>>> >>>>> >>> >> >> >