Those are just our ideas/suggestions of course, so if you have any ideas
yourself make sure to include them.

Best,
Jo


On Fri, Apr 26, 2013 at 1:50 PM, Joachim Daiber <daiber.joac...@gmail.com>wrote:

> Hey Zhiwei,
>
> this goes into the right direction for this sub-task. Please ensure that
> your proposal addresses the following problems:
>
> - Roughly how would the immediate input format look like?
> - Would the immediate file go straight into HDFS?
> - How do you plan to limit negative influence of this abstraction to the
> indexing process? A small performance hit for the indexing will likely not
> be avoidable, but this is an important issue since with the current system
> reading from XML dumps directly, performance is still an issue and the
> English (the biggest) version still takes several hours on a reasonably
> sized cluster.
>
> Please make sure you understand the differences between the indexing
> pipelines. The Lucene backend is distinct from the pignlproc based backend
> (what we sometimes call DB-backed core).
>
> Other than that, I think that this sub-task is very reasonably achievable
> and as such is not alone appropriate for the for a project of this
> size/funding. It also has limited benefits for the performance of the
> system (it will improve the system architecture+flexibility), we would also
> like to see some work on improving the general performance. Finishing the
> integration of the graph-based methods (mentioned on the wiki page), would
> be a fairly straight-forward and manageable addition to this.
>
> Best,
> Jo
> ​
>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to