[Dbpedia-gsoc] GSoC 2015 - Introduction (Mingzhe)

2015-03-27 Thread Mingzhe Du
Hi everyone,

My name is Mingzhe. I am a PhD student at the University of South Carolina.
My work mainly focuses on Natural Language Processing and NLP related web
development.

I am the principle developer of Wikitheoria.com [1], an NSF sponsored
web-based crowd-sourcing tool to share and collaborate on sociological
researchable ideas. The ultimate goal of this project is to contribute the
well-structured sociology information and knowledge to our Linked Data
community. I am proficient in Python, Java at the back-end, Javascript and
jQuery at the front-end. I have also been using NodeJS, angularJS and
MongoDB during my development at HelpMonger.com [2].

I am particularly interested in project idea *5.10 DBpedia Metadata
Datasets*. I have some experience on RDF and SPARQL during the course study
of Natural Language Processing and Service Oriented Computing. I believe
this project will help me gain more experience and knowledge that I could
apply to Wikitheoria in the future. I have submitted my proposal on
http://www.google-melange.com/.

Hoping to work with you soon.

References
[1] http://www.wikitheoria.com
[2] http://www.helpmonger.com http://www.wikitheoria.com/


Best,
Mingzhe
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC 2015 Introduction

2015-03-25 Thread Алексей Степанов
Hi,

Sorry for my silence - it was 2 hard weeks in University.

I chose 5.4 task.

https://docs.google.com/document/d/1TdzP45vntVU4ufTpKcN_ftfE9zIDgh8NBorxZW-Iyf0/edit?usp=sharing
- this is my proposal.

I will wait for a response.

Regards,
Alexey Stepanov

On 9 March 2015 at 17:54, Dimitris Kontokostas jimk...@gmail.com wrote:

 Hi Alexey  welcome to DBpedia!


 On Sun, Mar 8, 2015 at 8:10 PM, Алексей Степанов fec...@gmail.com wrote:

 Hi everyone,

 My name is Alex, I'm a first year aspirant of Moscow State University of
 department of Computational Mathematics and Cybernetics.

 I'm interested in one of the next topics:
 5.4. Mappings freshness  Better statistics / reporting tools
 5.5. Improved Mapping Support for the Mappings Wiki
 5.6. DBpedia Data Error Reporting Tool
 5.8. DBpedia Live scaling  new interface

 I have 2 years experience in Java programming. Also I have good knowledge
 in SQL-programming. Me and my science adviser are interested in Semantic
 Web/Linked Open Data and Databases. And I want to get knowledge and
 experience in Scala and JavaScript.

 Can you share any suggestions in which can I work on for the GSoC Warm-up
 that can be related to the topics 5.4 - 5.5?


 Please have a look at this thread where we suggest some warm up tasks and
 provide more details

 http://www.mail-archive.com/dbpedia-gsoc@lists.sourceforge.net/msg00578.html

 Cheers,
 Dimitris



 Hoping to collaborate with you very soon, even if not in the GSoC program.



 --
 Regards,
 Alexey Stepanov


 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub
 for all
 things parallel software development, from weekly thought leadership
 blogs to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc




 --
 Kontokostas Dimitris

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC 2015 Introduction

2015-03-25 Thread Andre Pereira
Hi,

you need to go to the official GSoC site
https://www.google-melange.com/gsoc/homepage/google/gsoc2015, create a
student profile and submit your proposal there, otherwise you won't be
officially applying for GSoC.

Best regards,
André Pereira

On 25 March 2015 at 19:26, Алексей Степанов fec...@gmail.com wrote:

 Hi,

 Sorry for my silence - it was 2 hard weeks in University.

 I chose 5.4 task.


 https://docs.google.com/document/d/1TdzP45vntVU4ufTpKcN_ftfE9zIDgh8NBorxZW-Iyf0/edit?usp=sharing
 - this is my proposal.

 I will wait for a response.

 Regards,
 Alexey Stepanov

 On 9 March 2015 at 17:54, Dimitris Kontokostas jimk...@gmail.com wrote:

 Hi Alexey  welcome to DBpedia!


 On Sun, Mar 8, 2015 at 8:10 PM, Алексей Степанов fec...@gmail.com
 wrote:

 Hi everyone,

 My name is Alex, I'm a first year aspirant of Moscow State University of
 department of Computational Mathematics and Cybernetics.

 I'm interested in one of the next topics:
 5.4. Mappings freshness  Better statistics / reporting tools
 5.5. Improved Mapping Support for the Mappings Wiki
 5.6. DBpedia Data Error Reporting Tool
 5.8. DBpedia Live scaling  new interface

 I have 2 years experience in Java programming. Also I have good
 knowledge in SQL-programming. Me and my science adviser are interested in
 Semantic Web/Linked Open Data and Databases. And I want to get knowledge
 and experience in Scala and JavaScript.

 Can you share any suggestions in which can I work on for the GSoC
 Warm-up that can be related to the topics 5.4 - 5.5?


 Please have a look at this thread where we suggest some warm up tasks and
 provide more details

 http://www.mail-archive.com/dbpedia-gsoc@lists.sourceforge.net/msg00578.html

 Cheers,
 Dimitris



 Hoping to collaborate with you very soon, even if not in the GSoC
 program.



 --
 Regards,
 Alexey Stepanov


 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub
 for all
 things parallel software development, from weekly thought leadership
 blogs to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc




 --
 Kontokostas Dimitris




 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSOC 2015 - Introduction

2015-03-23 Thread Thiago Galery
Hi Vasanth, I suggest you taking a look at the previous messages in the
mailing list archives and check out the discussion there, so you have a
better idea of what to do. Bare in mind that submission date is really
close, so you'd need to look into this asap.
All the best,
Thiago

On Mon, Mar 23, 2015 at 5:07 PM, Vasanth Kalingeri 
vasanth.kaling...@gmail.com wrote:

 Hi,
 My name is Vasanth Kalingeri. I am a 3rd year undergrad in
 computer science, pursuing my engineering in SJCE Mysore. I have completed
 a course on machine learning in Coursera, which further lead me into an
 interest towards NLP. I am also freelancing since 2 years.
 My interest for NLP grew primarily when I wanted a knowledge base
 from a given corpus of text, so that it could answer questions on the
 corpus. This lead me to dbpedia and further into the topic 5.1.
 I am extremely interested in building such a system to extract
 facts from a corpus. Will get working on the warmup tasks soon.
 Regards,
 Vasanth


 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC 2015 - Introduction

2015-03-12 Thread David Przybilla
Hi Shashank,

It looks alright.
I think you can skip the Spark part, as you are not interested in the
project concerning the model building.

As for the specific project you selected I think best would be to:

- Understand how a spotlight model is divided (Surface form store, Context
Store, Candidate Store). Probably this blog [1] entry can help you  as well
as playing with [2]

- Also reading the main paper on which spotlight is based on (I previously
mentioned it but it is also mentioned in the literature at github)

[1]
http://engineering.idioplatform.com/2015/02/23/spotlight-model-editor.html
[2] https://github.com/idio/spotlight-model-editor

On Thu, Mar 12, 2015 at 1:35 PM, shashank juyal sjuyal...@gmail.com wrote:

 Hi David,

 Please find attached the warm up tasks I have done.
 I am still involved in some of the issues and documentation. I have also
 mentioned those in the pdf.
 Please let me know if any other warm up task has to be done.

 Thanks and Regards,
 Shashank Juyal



 On Sun, Mar 8, 2015 at 12:36 AM, David Przybilla dav.alejan...@gmail.com
 wrote:

 Hi Shashank,

 On DBpedia Spotlight – Better Context Vectors:

 Here are the DBPedia Spotlight warm tasks:
 https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Warm-up-tasks

 if you take a look at the github issue page you should find some of the
 problems we are dealing with. One of the ideas could be experimenting with
 word2vec.

 Have a nice weekend :)

 On Sat, Mar 7, 2015 at 11:46 AM, shashank juyal sjuyal...@gmail.com
 wrote:

 Hi,

 I am a Masters student in International Institute of Information
 technology, Hyderabad (IIIT-H). I am interested in taking part in this
 year's GSOC. Many of the projects in DBPedia sounds very familiar and
 interesting to me as I have worked closely with many of the concepts and
 technologies used in the project.

 I have worked previously with Wikipedia data and built a small search
 over it based on tf-idf score and my own parser. Also currently I am
 working in a project Question Answer techniques using NLP which uses
 concepts like wordtovec, CBOW, NL Processing and translation to query
 language, which are mentioned in some of the projects in DBPedia-Spotlight.

 Based on this, I would like to work on the following projects:

 1) Fact Extraction from Wikipedia Text
 2) Keyword Search on DBpedia
 3) Deploying a DBpedia Question Answering Engine
 4) DBpedia Spotlight – Better Context Vectors

 Please let me know the warm-up tasks in the above projects.

 Linked Profile: in.linkedin.com/in/shajuyal
 Github Profile: https://github.com/sjuyal

 Thanks and Regards,
 Shashank Juyal


 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub
 for all
 things parallel software development, from weekly thought leadership
 blogs to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc




--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC 2015 Introduction

2015-03-09 Thread Dimitris Kontokostas
Hi Alexey  welcome to DBpedia!


On Sun, Mar 8, 2015 at 8:10 PM, Алексей Степанов fec...@gmail.com wrote:

 Hi everyone,

 My name is Alex, I'm a first year aspirant of Moscow State University of
 department of Computational Mathematics and Cybernetics.

 I'm interested in one of the next topics:
 5.4. Mappings freshness  Better statistics / reporting tools
 5.5. Improved Mapping Support for the Mappings Wiki
 5.6. DBpedia Data Error Reporting Tool
 5.8. DBpedia Live scaling  new interface

 I have 2 years experience in Java programming. Also I have good knowledge
 in SQL-programming. Me and my science adviser are interested in Semantic
 Web/Linked Open Data and Databases. And I want to get knowledge and
 experience in Scala and JavaScript.

 Can you share any suggestions in which can I work on for the GSoC Warm-up
 that can be related to the topics 5.4 - 5.5?


Please have a look at this thread where we suggest some warm up tasks and
provide more details
http://www.mail-archive.com/dbpedia-gsoc@lists.sourceforge.net/msg00578.html

Cheers,
Dimitris



 Hoping to collaborate with you very soon, even if not in the GSoC program.



 --
 Regards,
 Alexey Stepanov


 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc




-- 
Kontokostas Dimitris
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC 2015 Introduction

2015-03-09 Thread Marco Fossati
Hi Robert,

On 3/8/15 8:20 PM, rlits...@mail.uni-mannheim.de wrote:
 Hi Thiago, Hi DBPedia-Team,

 thanks for your reply. I'd like to clarify a fundamental question:

 - In the previous GSoC the participant seem to have built his own
 goldstandard of mappings. Are standard benchmarks for quality
 measurement insufficient
Which standards are you thinking about? Could you reference them?
, i.e. does the schema matching quality vary
 and depend that much on the source schemas used?

 - In my opinion one could tackle this task either practically oriented
 by implementing a promising approach, or research oriented by working
 on an improvement of existing solutions. Which approach is likely the
 way to go?
For the scope of GSoC, I would advocate the former.

 I have done quite some research and gone through a few papers and I
 think I have fair understanding. Are there any particular warm-up task
 related to this task that you could suggest?
Have a look at the unsolved issues on the specific repo of GSoC 2014 
project:
https://github.com/dbpedia/wikidata-mapper/issues
Those tasks can be applied to Freebase schema too.

 Best regards

 Robert

 Zitat von Thiago Galery tgal...@gmail.com:

 Hi Robert, I would advise taking a look at Marco's response to another
 prospective student. He points to these links for a summary of a similar
 project in 2014


 -idea: http://wiki.dbpedia.org/gsoc2014/ideas#h359-11
 -proposal:
 https://docs.google.com/document/d/16lAqKLAsAGQW0cp9SA0Egb1vlb6mPCcHYezVN-zB870/edit?pli=1
 -stuff
 https://docs.google.com/document/d/16lAqKLAsAGQW0cp9SA0Egb1vlb6mPCcHYezVN-zB870/edit?pli=1-stuff
 done:
 https://github.com/dbpedia/extraction-framework/wiki/GSoC-2014-Progress-Sergey-Skovorodkin


 On Fri, Mar 6, 2015 at 12:04 PM, rlits...@mail.uni-mannheim.de wrote:

 Hello everybody,

 first off I'd like to introduce myself . I'm Robert, a current Masters
 student at the Mannheim University. I'm studying Business Informatics
 and pursuing
 the Data and Web Science Specialization Track. One of my major
 interests lies in
 Data Mining and I constantly complement my studies with Data Mining
 related online
 courses (MOOCs) during my free time. Alongside my studies I'm also
 employed as a
 student researcher at the Data and Web Science research group [1] under the
 supervision of Prof. Bizer. You will find many professors mentioned in
 many of the
 papers you suggest as a starting point. A major part of the research
 is particularly
 dedicated at Open Linked Data, hence the education is close-knit with
 examples
 and from research projects.

 Furthermore, during one of my previous internships I have been involed
 in building
 an Active Learning system for Named Entity Recognition which has also
 enhanced my
 experience within this field. The first time I got in touch with NLP
 and Machine Learning
 was during my Bachelor Thesis that concerned with the classification
 of Scientific Papers.

 Now coming to the GSoC project:

 My first priority would be to work on 5.7. Reverse Engineering and
 Aligning Freebase
 with DBpedia. I have a working knowledge of Sparql and the Freebase
 MQL query language
 if needed. During my prior semester I have used DBPedia and Freebase
 to perform web
 data integration in a closed domain. So I'm aware of schema
 integration and schema matching
 procedures, which I think qualifies me along with my programming
 experience fairly well.
 After digging into the proposal of the project there are some
 uncertainties that aroused.
 In the descriptin you mention the introduction of new properties and
 classes if needed.
 Your first reference [2] concerns mainly with the reduction/fusion of
 closely related
 or equivalent properties.

 - Can you give me an intuition of a situation where a need for a new class
 or
 property would arise?

 - Can you also please give an example of tools that are based on
 freebase and that
 should be easily migrated to DBpedia?

 - Speaking of the current approaches of mapping classes and
 properties, is there any
 work currently going on that deal with hierarchies of subjects and objects?

 - Related to [2], do S1 and O1 represent actual subjects and objects
 or rdf:type classes
 of S1 and O1? I think one problem could (at least partially) solve the
 other, namely
 using a trustful class mapping could assist in working out equivalent
 property mappings
 and vice versa.

 I would be available full-time during the time period of GSoC and it comes
 naturally for me that I get myself into the latest research prior the start
 of the GSoC period.

 - Can you please advise me what would be the next step?

 - The project mentioned above is only one of my interests given your
 proposals. Do I
 have to elaborate my interest to my second and third priority in a
 similar way?

 Best regads

 Robert

 [1] http://dws.informatik.uni-mannheim.de/en/home/
 [2] http://wiki.knoesis.org/index.php/Property_Alignment



 

Re: [Dbpedia-gsoc] GSoC 2015 - Introduction

2015-03-09 Thread Alexandru Todor
Hi Guido,

Dimitris already gave you some hints on bugs/features you can be working on.
What I can give you are some general tasks regarding to topic 5.5 Improving
the Mappings Wiki (5.4 has similar requirements):

There are 2 main components you will be working with, the dbpedia mappings
wiki and the server component of the extraction framework.

The mappings wiki is a modified version of Mediawiki. It stores the
mappings between Mediawiki Templates and DBpedia Classes/properties. Each
template is mapped onto a dbpedia class and each property in the template
is mapped onto a dbpedia ontology property. Whenever an editor saves a
mapping he has the option of validating it. This option is presented as a
validate button besides the save button. By clicking this button a service
call is executed to the Server component of the DBpedia Extraction
Framework. When the call is made the contents of the mediawiki article are
passed to the server, the server then analyzes if the text conforms to the
dbpedia mappings syntax and validates it. If it passes the validation the
mappings wiki tells the editor his mapping is valid, otherwise not valid.
Of course the mappings wiki does more things but this is just go get a
quick idea.

I can give you 2 fast warm-up tasks with more to follow:

1) Create a mediawiki extension [1] that hooks into the create/edit
workflow of mediawiki [2] , you will use the necessary hooks for that.
Insert another button besides save that calls a rest web service.
2) Get the server module of the extraction framework up and running and
experiment with it. [3] [4] (The documentation is a bit outdated but should
work with minor changes)

[1] http://www.mediawiki.org/wiki/Manual:Developing_extensions
[2] http://www.mediawiki.org/wiki/Manual:Hooks
[3] http://wiki.dbpedia.org/Documentation#h25-10z
[4] http://wiki.dbpedia.org/Server

On Mon, Mar 9, 2015 at 10:08 AM, Dimitris Kontokostas jimk...@gmail.com
wrote:

 Hi Guido  welcome to DBpedia

 issues 355, 354  327 are related to the mappings wiki/server

 Cheers,
 DImitris

 On Sat, Mar 7, 2015 at 12:29 PM, Guido Pio Mariotti 
 guidopio.mariott...@gmail.com wrote:

 Hi,
 my name is Guido, I'm a student of Politecnico of Turin and actually I
 attend the first year of the master's degree in Computer Engineering.
 I'm interested in the topic 5.4 and 5.5, and I already have knowledge of
 Java and Javascript, also I'm going to take a PHP course in this semester,
 so I was thinking of start learning Scala.
 Do you have any suggestions in which bugs/features can I work on for the
 GSoC Warm-up that can be related to the two topic in which I'm interested?

 Hoping to collaborate with you very soon, even if not in the GSoC
 program, I wish you a nice week-end.


 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub
 for all
 things parallel software development, from weekly thought leadership
 blogs to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc




 --
 Kontokostas Dimitris


 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC 2015 Introduction

2015-03-08 Thread rlitschk
Hi Thiago, Hi DBPedia-Team,

thanks for your reply. I'd like to clarify a fundamental question:

- In the previous GSoC the participant seem to have built his own  
goldstandard of mappings. Are standard benchmarks for quality  
measurement insufficient, i.e. does the schema matching quality vary  
and depend that much on the source schemas used?

- In my opinion one could tackle this task either practically oriented  
by implementing a promising approach, or research oriented by working  
on an improvement of existing solutions. Which approach is likely the  
way to go?

I have done quite some research and gone through a few papers and I  
think I have fair understanding. Are there any particular warm-up task  
related to this task that you could suggest?

Best regards

Robert

Zitat von Thiago Galery tgal...@gmail.com:

 Hi Robert, I would advise taking a look at Marco's response to another
 prospective student. He points to these links for a summary of a similar
 project in 2014


 -idea: http://wiki.dbpedia.org/gsoc2014/ideas#h359-11
 -proposal:
 https://docs.google.com/document/d/16lAqKLAsAGQW0cp9SA0Egb1vlb6mPCcHYezVN-zB870/edit?pli=1
 -stuff
 https://docs.google.com/document/d/16lAqKLAsAGQW0cp9SA0Egb1vlb6mPCcHYezVN-zB870/edit?pli=1-stuff
 done:
 https://github.com/dbpedia/extraction-framework/wiki/GSoC-2014-Progress-Sergey-Skovorodkin


 On Fri, Mar 6, 2015 at 12:04 PM, rlits...@mail.uni-mannheim.de wrote:

 Hello everybody,

 first off I'd like to introduce myself . I'm Robert, a current Masters
 student at the Mannheim University. I'm studying Business Informatics
 and pursuing
 the Data and Web Science Specialization Track. One of my major
 interests lies in
 Data Mining and I constantly complement my studies with Data Mining
 related online
 courses (MOOCs) during my free time. Alongside my studies I'm also
 employed as a
 student researcher at the Data and Web Science research group [1] under the
 supervision of Prof. Bizer. You will find many professors mentioned in
 many of the
 papers you suggest as a starting point. A major part of the research
 is particularly
 dedicated at Open Linked Data, hence the education is close-knit with
 examples
 and from research projects.

 Furthermore, during one of my previous internships I have been involed
 in building
 an Active Learning system for Named Entity Recognition which has also
 enhanced my
 experience within this field. The first time I got in touch with NLP
 and Machine Learning
 was during my Bachelor Thesis that concerned with the classification
 of Scientific Papers.

 Now coming to the GSoC project:

 My first priority would be to work on 5.7. Reverse Engineering and
 Aligning Freebase
 with DBpedia. I have a working knowledge of Sparql and the Freebase
 MQL query language
 if needed. During my prior semester I have used DBPedia and Freebase
 to perform web
 data integration in a closed domain. So I'm aware of schema
 integration and schema matching
 procedures, which I think qualifies me along with my programming
 experience fairly well.
 After digging into the proposal of the project there are some
 uncertainties that aroused.
 In the descriptin you mention the introduction of new properties and
 classes if needed.
 Your first reference [2] concerns mainly with the reduction/fusion of
 closely related
 or equivalent properties.

 - Can you give me an intuition of a situation where a need for a new class
 or
 property would arise?

 - Can you also please give an example of tools that are based on
 freebase and that
 should be easily migrated to DBpedia?

 - Speaking of the current approaches of mapping classes and
 properties, is there any
 work currently going on that deal with hierarchies of subjects and objects?

 - Related to [2], do S1 and O1 represent actual subjects and objects
 or rdf:type classes
 of S1 and O1? I think one problem could (at least partially) solve the
 other, namely
 using a trustful class mapping could assist in working out equivalent
 property mappings
 and vice versa.

 I would be available full-time during the time period of GSoC and it comes
 naturally for me that I get myself into the latest research prior the start
 of the GSoC period.

 - Can you please advise me what would be the next step?

 - The project mentioned above is only one of my interests given your
 proposals. Do I
 have to elaborate my interest to my second and third priority in a
 similar way?

 Best regads

 Robert

 [1] http://dws.informatik.uni-mannheim.de/en/home/
 [2] http://wiki.knoesis.org/index.php/Property_Alignment



 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. 

Re: [Dbpedia-gsoc] GSoC 2015 Introduction and Parallel processing in DBpedia extraction Framework

2015-03-08 Thread Navin Pai
Yup, looking at the changelog of Apache Spark and having worked on
upgrading much smaller applications across Spark versions, I can attest
that this process shouldn't take too much time. The number of breaking
changes are very minimal in recent versions.

An idea I had, which I would like feedback on is having a configuration
picker, rather than a list of preconfigured container/images. Kind of along
the lines of Fedora's Revisor project [1]. You could mix and match
depending on the configuration you want to use and a customized
image/container is created for you. Of course, the feasibility of this is
an open question...

Honestly, if you ask me, this one project could probably be broken up into
multiple projects, each with a different end goal. Docker brings in a very
interesting set of things to play with, and it would be great if some of
the mentors could provide more feedback on what the end goal of this
specific GSoC project is. :)

Thanks

[1] http://revisor.fedoraunity.org/



Hi Xiao, and welcome!

 Some thoughts from my initial impression and I appreciate your feedback:
  ?- The project ?uses?? ?spark 0.9.1 while the latest version? of spark?
 is
  bumped to 1.2.1.? I suppose there will be some work on upgrade it to the
  new version.?
 

 It'll perhaps be good to port the code to Spark 1.2.1; I can't imagine
 it'll take too much work because the Spark API has been pretty stable since
 that.


  - It looks like the process is putting the data into HDFS, using spark
 the
  exact data and writing result back to HDFS. ?Are there any design
 document
  for this project?
 

 Yes, but it can also work without HDFS. On a single-node cluster you can
 write directly

to the file system (I'm not sure if there is enough
 documentation on that, but there should be; it's mostly about substituting
 hdfs:///home/user/blah with file:///home/user/blah). On a multi-node
 cluster with NFS you can also work without HDFS.

 I have been meaning to write a proper paper on the project since a few
 months but never managed to get around to it.

 - Spark can works with various distributed file system (S3, GlusterFS, etc)
  not limited to HDFS. So I suppose this could be configurable.


 It'd be a good idea to make this configurable, and I suppose it fits in
 well with the docker containers idea too. Different kinds of configurations
 for EC2/S3, Google Cloud etc.

 Feel free to ask any other questions that you may have while running it.

 Cheers,
 Nilesh

 You can also email me at cont...@nileshc.com or visit my website
 http://nileshc.com/


 On Thu, Mar 5, 2015 at 8:27 PM, Xiao Meng xiaom...@gmail.com wrote:

  Hi,
 
  My name is?
   Xiao, currently a PhD student in Simon Fraser University, Canada.
  ?
 
 
  A little background on myself:
 
  - My research is mainly on data management especially on NoSQL databases.
  - I worked for GSoC 2008 on PostgreSQL [1] when I was an undergraduate
  student:-)
  -
  ?Now ?
  I have been working on some open source projects for one year.
  ?They?
   include Apache Hive[2] and Apache Drill[3], both are SQL-on-Hadoop
  engines. I've
  ?also ?
  played
  ?Apache S?
  park for a while and have some hand-on experiences.
  ?I am learning scala and pretty like it.?
 
  - During the period
  ? of working on Hadoop ecosystem?
  , I gained experience on deploying clusters for dev and test. Docker is a
  great tool for this purpose and I have been building several complex
 docker
  containers [4].
 
  I've heard the
  ?great
   DBpedia project long times ago and always want to play with it:-)
 
  Given my background,  I am pretty interested in the following project:
  ? ?
  Parallel processing in DBpedia extraction Framework
  ?[5]?.
 
 
  Some thoughts from my initial impression and I appreciate your feedback:
 
  ?- The project ?
  uses?
  ? ?
  spark 0.9.1 while the latest version
  ? of spark?
  is bumped to 1.2.1.
  ?
  I suppose there will be some work on upgrade it to the new version.
  ?
  - I
  t looks like the process is putting the data into HDFS, using spark the
  exact data and writing result back to HDFS.
  ?
  Are there any design document for this project?
  - Spark can works with various distributed file system (S3,
 GlusterFS,
  etc) not limited to HDFS. So I suppose this could be configurable.
 
  ?I will try it out in following days.
  ? Any suggestions for evolving this project?
  ?
 
  ?Look forward to contributing to DBpedia!
 
 
  [1] https://wiki.postgresql.org/wiki/GSoC_2008
  [2] https://github.com/xiaom/docker-drill
  [3] https://github.com/apache/hive
  [4] https://github.com/apache/drill
  [5] https://github.com/dbpedia/distributed-extraction-framework


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership 

Re: [Dbpedia-gsoc] GSoC 2015 - Introduction

2015-03-07 Thread David Przybilla
Hi Shashank,

On DBpedia Spotlight – Better Context Vectors:

Here are the DBPedia Spotlight warm tasks:
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Warm-up-tasks

if you take a look at the github issue page you should find some of the
problems we are dealing with. One of the ideas could be experimenting with
word2vec.

Have a nice weekend :)

On Sat, Mar 7, 2015 at 11:46 AM, shashank juyal sjuyal...@gmail.com wrote:

 Hi,

 I am a Masters student in International Institute of Information
 technology, Hyderabad (IIIT-H). I am interested in taking part in this
 year's GSOC. Many of the projects in DBPedia sounds very familiar and
 interesting to me as I have worked closely with many of the concepts and
 technologies used in the project.

 I have worked previously with Wikipedia data and built a small search over
 it based on tf-idf score and my own parser. Also currently I am working in
 a project Question Answer techniques using NLP which uses concepts like
 wordtovec, CBOW, NL Processing and translation to query language, which are
 mentioned in some of the projects in DBPedia-Spotlight.

 Based on this, I would like to work on the following projects:

 1) Fact Extraction from Wikipedia Text
 2) Keyword Search on DBpedia
 3) Deploying a DBpedia Question Answering Engine
 4) DBpedia Spotlight – Better Context Vectors

 Please let me know the warm-up tasks in the above projects.

 Linked Profile: in.linkedin.com/in/shajuyal
 Github Profile: https://github.com/sjuyal

 Thanks and Regards,
 Shashank Juyal


 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC 2015 Introduction and Parallel processing in DBpedia extraction Framework

2015-03-07 Thread Nilesh Chakraborty
Hi Xiao, and welcome!

Some thoughts from my initial impression and I appreciate your feedback:
 ​- The project ​uses​​ ​spark 0.9.1 while the latest version​ of spark​ is
 bumped to 1.2.1.​ I suppose there will be some work on upgrade it to the
 new version.​


It'll perhaps be good to port the code to Spark 1.2.1; I can't imagine
it'll take too much work because the Spark API has been pretty stable since
that.


 - It looks like the process is putting the data into HDFS, using spark the
 exact data and writing result back to HDFS. ​Are there any design document
 for this project?


Yes, but it can also work without HDFS. On a single-node cluster you can
write directly to the file system (I'm not sure if there is enough
documentation on that, but there should be; it's mostly about substituting
hdfs:///home/user/blah with file:///home/user/blah). On a multi-node
cluster with NFS you can also work without HDFS.

I have been meaning to write a proper paper on the project since a few
months but never managed to get around to it.

- Spark can works with various distributed file system (S3, GlusterFS, etc)
 not limited to HDFS. So I suppose this could be configurable.


It'd be a good idea to make this configurable, and I suppose it fits in
well with the docker containers idea too. Different kinds of configurations
for EC2/S3, Google Cloud etc.

Feel free to ask any other questions that you may have while running it.

Cheers,
Nilesh

You can also email me at cont...@nileshc.com or visit my website
http://nileshc.com/


On Thu, Mar 5, 2015 at 8:27 PM, Xiao Meng xiaom...@gmail.com wrote:

 Hi,

 My name is​
  Xiao, currently a PhD student in Simon Fraser University, Canada.
 ​


 A little background on myself:

 - My research is mainly on data management especially on NoSQL databases.
 - I worked for GSoC 2008 on PostgreSQL [1] when I was an undergraduate
 student:-)
 -
 ​Now ​
 I have been working on some open source projects for one year.
 ​They​
  include Apache Hive[2] and Apache Drill[3], both are SQL-on-Hadoop
 engines. I've
 ​also ​
 played
 ​Apache S​
 park for a while and have some hand-on experiences.
 ​I am learning scala and pretty like it.​

 - During the period
 ​ of working on Hadoop ecosystem​
 , I gained experience on deploying clusters for dev and test. Docker is a
 great tool for this purpose and I have been building several complex docker
 containers [4].

 I've heard the
 ​great
  DBpedia project long times ago and always want to play with it:-)

 Given my background,  I am pretty interested in the following project:
 ​ ​
 Parallel processing in DBpedia extraction Framework
 ​[5]​.


 Some thoughts from my initial impression and I appreciate your feedback:

 ​- The project ​
 uses​
 ​ ​
 spark 0.9.1 while the latest version
 ​ of spark​
 is bumped to 1.2.1.
 ​
 I suppose there will be some work on upgrade it to the new version.
 ​
 - I
 t looks like the process is putting the data into HDFS, using spark the
 exact data and writing result back to HDFS.
 ​
 Are there any design document for this project?
 - Spark can works with various distributed file system (S3, GlusterFS,
 etc) not limited to HDFS. So I suppose this could be configurable.

 ​I will try it out in following days.
 ​ Any suggestions for evolving this project?
 ​

 ​Look forward to contributing to DBpedia!


 [1] https://wiki.postgresql.org/wiki/GSoC_2008
 [2] https://github.com/xiaom/docker-drill
 [3] https://github.com/apache/hive
 [4] https://github.com/apache/drill
 [5] https://github.com/dbpedia/distributed-extraction-framework


 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Dbpedia-gsoc mailing list
 Dbpedia-gsoc@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


[Dbpedia-gsoc] [GSOC 2015] Introduction and Help in starting the contribution

2015-03-03 Thread Nurendra Choudhary
Hi Everybody,

I am Nurendra Choudhary from International Institute of Information
Technology, Hyderabad, India [1] and doing my major in Computational
Linguistics.
My interests lie in Natural Language Processing, Artificial Intelligence
and Machine Learning. I like coding in Python, C, C++. Here's my
SourceForge[2] and Github[3] profile. I normally go by the name akirato
when doing projects or any coding.
I went through the Ideas Page for GSOC 2015 and am really interested in the
Fact Extraction from Wikipedia Text project.
I have some ideas on the project. Like maybe, the first step could be to
find the relation between verbs and the rest of the parts (something like
theta roles, maybe) which further can be developed to finding relation
between all pairs of words and so on.
I have setup the development environment with Eclipse.
Could you help me in proceeding further with the necessities required for
the project?

[1]http://iiit.ac.in/
[2]http://sourceforge.net/u/akirato/profile/
[3]https://github.com/Akirato/

Regards,
Nurendra Choudhary
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc