>> 1) Is there anyone out of PG comunity who will be interested in such >> project and can be a menthor? >> 2) These two points have a general idea – to simplify work with a large >> amount of data from a different sources, but mybe it would be better to >> focus on the single task? >> > > I spent lot of time on implementation @1 - maybe I found somewhere a > patch. Both tasks has some common - you have to divide import to more > batches. >
Patch is in /dev/null :( - My implementation was based on subtransactions for 1000 rows. When some checks fails, then I throw subtransaction and I imported every row from block in own subtransaction. It was a prototype - I didn't search some smarter implementation. > > > >> 3) Is it realistic to mostly finish both parts during the 3+ months of >> almost full-time work or I am too presumptuous? >> > > It is possible, I am thinking - I am not sure about all possible details, > but basic implementation can be done in 3 months. > Some data, some check depends on order - it can be a problem in parallel processing - you should to define corner cases. > > >> >> I will be very appreciate to any comments and criticism. >> >> >> P.S. I know about very interesting ready projects from the PG's comunity >> https://wiki.postgresql.org/wiki/GSoC_2017, but it always more >> interesting to solve your own problems, issues and questions, which are the >> product of you experience with software. That's why I dare to propose my >> own project. >> >> P.P.S. A few words about me: I'm a PhD stident in Theoretical physics >> from Moscow, Russia, and highly involved in software development since >> 2010. I guess that I have good skills in Python, Ruby, JavaScript, MATLAB, >> C, Fortran development and basic understanding of algorithms design and >> analysis. >> >> >> Best regards, >> >> Alexey >> > >