[Dbpedia-gsoc] GSoC 2015 - Introduction (Mingzhe)
Hi everyone, My name is Mingzhe. I am a PhD student at the University of South Carolina. My work mainly focuses on Natural Language Processing and NLP related web development. I am the principle developer of Wikitheoria.com [1], an NSF sponsored web-based crowd-sourcing tool to share and collaborate on sociological researchable ideas. The ultimate goal of this project is to contribute the well-structured sociology information and knowledge to our Linked Data community. I am proficient in Python, Java at the back-end, Javascript and jQuery at the front-end. I have also been using NodeJS, angularJS and MongoDB during my development at HelpMonger.com [2]. I am particularly interested in project idea *5.10 DBpedia Metadata Datasets*. I have some experience on RDF and SPARQL during the course study of Natural Language Processing and Service Oriented Computing. I believe this project will help me gain more experience and knowledge that I could apply to Wikitheoria in the future. I have submitted my proposal on http://www.google-melange.com/. Hoping to work with you soon. References [1] http://www.wikitheoria.com [2] http://www.helpmonger.com http://www.wikitheoria.com/ Best, Mingzhe -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
Re: [Dbpedia-gsoc] GSoC 2015 Introduction
Hi, Sorry for my silence - it was 2 hard weeks in University. I chose 5.4 task. https://docs.google.com/document/d/1TdzP45vntVU4ufTpKcN_ftfE9zIDgh8NBorxZW-Iyf0/edit?usp=sharing - this is my proposal. I will wait for a response. Regards, Alexey Stepanov On 9 March 2015 at 17:54, Dimitris Kontokostas jimk...@gmail.com wrote: Hi Alexey welcome to DBpedia! On Sun, Mar 8, 2015 at 8:10 PM, Алексей Степанов fec...@gmail.com wrote: Hi everyone, My name is Alex, I'm a first year aspirant of Moscow State University of department of Computational Mathematics and Cybernetics. I'm interested in one of the next topics: 5.4. Mappings freshness Better statistics / reporting tools 5.5. Improved Mapping Support for the Mappings Wiki 5.6. DBpedia Data Error Reporting Tool 5.8. DBpedia Live scaling new interface I have 2 years experience in Java programming. Also I have good knowledge in SQL-programming. Me and my science adviser are interested in Semantic Web/Linked Open Data and Databases. And I want to get knowledge and experience in Scala and JavaScript. Can you share any suggestions in which can I work on for the GSoC Warm-up that can be related to the topics 5.4 - 5.5? Please have a look at this thread where we suggest some warm up tasks and provide more details http://www.mail-archive.com/dbpedia-gsoc@lists.sourceforge.net/msg00578.html Cheers, Dimitris Hoping to collaborate with you very soon, even if not in the GSoC program. -- Regards, Alexey Stepanov -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Kontokostas Dimitris -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
Re: [Dbpedia-gsoc] GSoC 2015 Introduction
Hi, you need to go to the official GSoC site https://www.google-melange.com/gsoc/homepage/google/gsoc2015, create a student profile and submit your proposal there, otherwise you won't be officially applying for GSoC. Best regards, André Pereira On 25 March 2015 at 19:26, Алексей Степанов fec...@gmail.com wrote: Hi, Sorry for my silence - it was 2 hard weeks in University. I chose 5.4 task. https://docs.google.com/document/d/1TdzP45vntVU4ufTpKcN_ftfE9zIDgh8NBorxZW-Iyf0/edit?usp=sharing - this is my proposal. I will wait for a response. Regards, Alexey Stepanov On 9 March 2015 at 17:54, Dimitris Kontokostas jimk...@gmail.com wrote: Hi Alexey welcome to DBpedia! On Sun, Mar 8, 2015 at 8:10 PM, Алексей Степанов fec...@gmail.com wrote: Hi everyone, My name is Alex, I'm a first year aspirant of Moscow State University of department of Computational Mathematics and Cybernetics. I'm interested in one of the next topics: 5.4. Mappings freshness Better statistics / reporting tools 5.5. Improved Mapping Support for the Mappings Wiki 5.6. DBpedia Data Error Reporting Tool 5.8. DBpedia Live scaling new interface I have 2 years experience in Java programming. Also I have good knowledge in SQL-programming. Me and my science adviser are interested in Semantic Web/Linked Open Data and Databases. And I want to get knowledge and experience in Scala and JavaScript. Can you share any suggestions in which can I work on for the GSoC Warm-up that can be related to the topics 5.4 - 5.5? Please have a look at this thread where we suggest some warm up tasks and provide more details http://www.mail-archive.com/dbpedia-gsoc@lists.sourceforge.net/msg00578.html Cheers, Dimitris Hoping to collaborate with you very soon, even if not in the GSoC program. -- Regards, Alexey Stepanov -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Kontokostas Dimitris -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
Re: [Dbpedia-gsoc] GSOC 2015 - Introduction
Hi Vasanth, I suggest you taking a look at the previous messages in the mailing list archives and check out the discussion there, so you have a better idea of what to do. Bare in mind that submission date is really close, so you'd need to look into this asap. All the best, Thiago On Mon, Mar 23, 2015 at 5:07 PM, Vasanth Kalingeri vasanth.kaling...@gmail.com wrote: Hi, My name is Vasanth Kalingeri. I am a 3rd year undergrad in computer science, pursuing my engineering in SJCE Mysore. I have completed a course on machine learning in Coursera, which further lead me into an interest towards NLP. I am also freelancing since 2 years. My interest for NLP grew primarily when I wanted a knowledge base from a given corpus of text, so that it could answer questions on the corpus. This lead me to dbpedia and further into the topic 5.1. I am extremely interested in building such a system to extract facts from a corpus. Will get working on the warmup tasks soon. Regards, Vasanth -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
Re: [Dbpedia-gsoc] GSoC 2015 - Introduction
Hi Shashank, It looks alright. I think you can skip the Spark part, as you are not interested in the project concerning the model building. As for the specific project you selected I think best would be to: - Understand how a spotlight model is divided (Surface form store, Context Store, Candidate Store). Probably this blog [1] entry can help you as well as playing with [2] - Also reading the main paper on which spotlight is based on (I previously mentioned it but it is also mentioned in the literature at github) [1] http://engineering.idioplatform.com/2015/02/23/spotlight-model-editor.html [2] https://github.com/idio/spotlight-model-editor On Thu, Mar 12, 2015 at 1:35 PM, shashank juyal sjuyal...@gmail.com wrote: Hi David, Please find attached the warm up tasks I have done. I am still involved in some of the issues and documentation. I have also mentioned those in the pdf. Please let me know if any other warm up task has to be done. Thanks and Regards, Shashank Juyal On Sun, Mar 8, 2015 at 12:36 AM, David Przybilla dav.alejan...@gmail.com wrote: Hi Shashank, On DBpedia Spotlight – Better Context Vectors: Here are the DBPedia Spotlight warm tasks: https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Warm-up-tasks if you take a look at the github issue page you should find some of the problems we are dealing with. One of the ideas could be experimenting with word2vec. Have a nice weekend :) On Sat, Mar 7, 2015 at 11:46 AM, shashank juyal sjuyal...@gmail.com wrote: Hi, I am a Masters student in International Institute of Information technology, Hyderabad (IIIT-H). I am interested in taking part in this year's GSOC. Many of the projects in DBPedia sounds very familiar and interesting to me as I have worked closely with many of the concepts and technologies used in the project. I have worked previously with Wikipedia data and built a small search over it based on tf-idf score and my own parser. Also currently I am working in a project Question Answer techniques using NLP which uses concepts like wordtovec, CBOW, NL Processing and translation to query language, which are mentioned in some of the projects in DBPedia-Spotlight. Based on this, I would like to work on the following projects: 1) Fact Extraction from Wikipedia Text 2) Keyword Search on DBpedia 3) Deploying a DBpedia Question Answering Engine 4) DBpedia Spotlight – Better Context Vectors Please let me know the warm-up tasks in the above projects. Linked Profile: in.linkedin.com/in/shajuyal Github Profile: https://github.com/sjuyal Thanks and Regards, Shashank Juyal -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
Re: [Dbpedia-gsoc] GSoC 2015 Introduction
Hi Alexey welcome to DBpedia! On Sun, Mar 8, 2015 at 8:10 PM, Алексей Степанов fec...@gmail.com wrote: Hi everyone, My name is Alex, I'm a first year aspirant of Moscow State University of department of Computational Mathematics and Cybernetics. I'm interested in one of the next topics: 5.4. Mappings freshness Better statistics / reporting tools 5.5. Improved Mapping Support for the Mappings Wiki 5.6. DBpedia Data Error Reporting Tool 5.8. DBpedia Live scaling new interface I have 2 years experience in Java programming. Also I have good knowledge in SQL-programming. Me and my science adviser are interested in Semantic Web/Linked Open Data and Databases. And I want to get knowledge and experience in Scala and JavaScript. Can you share any suggestions in which can I work on for the GSoC Warm-up that can be related to the topics 5.4 - 5.5? Please have a look at this thread where we suggest some warm up tasks and provide more details http://www.mail-archive.com/dbpedia-gsoc@lists.sourceforge.net/msg00578.html Cheers, Dimitris Hoping to collaborate with you very soon, even if not in the GSoC program. -- Regards, Alexey Stepanov -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Kontokostas Dimitris -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
Re: [Dbpedia-gsoc] GSoC 2015 Introduction
Hi Robert, On 3/8/15 8:20 PM, rlits...@mail.uni-mannheim.de wrote: Hi Thiago, Hi DBPedia-Team, thanks for your reply. I'd like to clarify a fundamental question: - In the previous GSoC the participant seem to have built his own goldstandard of mappings. Are standard benchmarks for quality measurement insufficient Which standards are you thinking about? Could you reference them? , i.e. does the schema matching quality vary and depend that much on the source schemas used? - In my opinion one could tackle this task either practically oriented by implementing a promising approach, or research oriented by working on an improvement of existing solutions. Which approach is likely the way to go? For the scope of GSoC, I would advocate the former. I have done quite some research and gone through a few papers and I think I have fair understanding. Are there any particular warm-up task related to this task that you could suggest? Have a look at the unsolved issues on the specific repo of GSoC 2014 project: https://github.com/dbpedia/wikidata-mapper/issues Those tasks can be applied to Freebase schema too. Best regards Robert Zitat von Thiago Galery tgal...@gmail.com: Hi Robert, I would advise taking a look at Marco's response to another prospective student. He points to these links for a summary of a similar project in 2014 -idea: http://wiki.dbpedia.org/gsoc2014/ideas#h359-11 -proposal: https://docs.google.com/document/d/16lAqKLAsAGQW0cp9SA0Egb1vlb6mPCcHYezVN-zB870/edit?pli=1 -stuff https://docs.google.com/document/d/16lAqKLAsAGQW0cp9SA0Egb1vlb6mPCcHYezVN-zB870/edit?pli=1-stuff done: https://github.com/dbpedia/extraction-framework/wiki/GSoC-2014-Progress-Sergey-Skovorodkin On Fri, Mar 6, 2015 at 12:04 PM, rlits...@mail.uni-mannheim.de wrote: Hello everybody, first off I'd like to introduce myself . I'm Robert, a current Masters student at the Mannheim University. I'm studying Business Informatics and pursuing the Data and Web Science Specialization Track. One of my major interests lies in Data Mining and I constantly complement my studies with Data Mining related online courses (MOOCs) during my free time. Alongside my studies I'm also employed as a student researcher at the Data and Web Science research group [1] under the supervision of Prof. Bizer. You will find many professors mentioned in many of the papers you suggest as a starting point. A major part of the research is particularly dedicated at Open Linked Data, hence the education is close-knit with examples and from research projects. Furthermore, during one of my previous internships I have been involed in building an Active Learning system for Named Entity Recognition which has also enhanced my experience within this field. The first time I got in touch with NLP and Machine Learning was during my Bachelor Thesis that concerned with the classification of Scientific Papers. Now coming to the GSoC project: My first priority would be to work on 5.7. Reverse Engineering and Aligning Freebase with DBpedia. I have a working knowledge of Sparql and the Freebase MQL query language if needed. During my prior semester I have used DBPedia and Freebase to perform web data integration in a closed domain. So I'm aware of schema integration and schema matching procedures, which I think qualifies me along with my programming experience fairly well. After digging into the proposal of the project there are some uncertainties that aroused. In the descriptin you mention the introduction of new properties and classes if needed. Your first reference [2] concerns mainly with the reduction/fusion of closely related or equivalent properties. - Can you give me an intuition of a situation where a need for a new class or property would arise? - Can you also please give an example of tools that are based on freebase and that should be easily migrated to DBpedia? - Speaking of the current approaches of mapping classes and properties, is there any work currently going on that deal with hierarchies of subjects and objects? - Related to [2], do S1 and O1 represent actual subjects and objects or rdf:type classes of S1 and O1? I think one problem could (at least partially) solve the other, namely using a trustful class mapping could assist in working out equivalent property mappings and vice versa. I would be available full-time during the time period of GSoC and it comes naturally for me that I get myself into the latest research prior the start of the GSoC period. - Can you please advise me what would be the next step? - The project mentioned above is only one of my interests given your proposals. Do I have to elaborate my interest to my second and third priority in a similar way? Best regads Robert [1] http://dws.informatik.uni-mannheim.de/en/home/ [2] http://wiki.knoesis.org/index.php/Property_Alignment
Re: [Dbpedia-gsoc] GSoC 2015 - Introduction
Hi Guido, Dimitris already gave you some hints on bugs/features you can be working on. What I can give you are some general tasks regarding to topic 5.5 Improving the Mappings Wiki (5.4 has similar requirements): There are 2 main components you will be working with, the dbpedia mappings wiki and the server component of the extraction framework. The mappings wiki is a modified version of Mediawiki. It stores the mappings between Mediawiki Templates and DBpedia Classes/properties. Each template is mapped onto a dbpedia class and each property in the template is mapped onto a dbpedia ontology property. Whenever an editor saves a mapping he has the option of validating it. This option is presented as a validate button besides the save button. By clicking this button a service call is executed to the Server component of the DBpedia Extraction Framework. When the call is made the contents of the mediawiki article are passed to the server, the server then analyzes if the text conforms to the dbpedia mappings syntax and validates it. If it passes the validation the mappings wiki tells the editor his mapping is valid, otherwise not valid. Of course the mappings wiki does more things but this is just go get a quick idea. I can give you 2 fast warm-up tasks with more to follow: 1) Create a mediawiki extension [1] that hooks into the create/edit workflow of mediawiki [2] , you will use the necessary hooks for that. Insert another button besides save that calls a rest web service. 2) Get the server module of the extraction framework up and running and experiment with it. [3] [4] (The documentation is a bit outdated but should work with minor changes) [1] http://www.mediawiki.org/wiki/Manual:Developing_extensions [2] http://www.mediawiki.org/wiki/Manual:Hooks [3] http://wiki.dbpedia.org/Documentation#h25-10z [4] http://wiki.dbpedia.org/Server On Mon, Mar 9, 2015 at 10:08 AM, Dimitris Kontokostas jimk...@gmail.com wrote: Hi Guido welcome to DBpedia issues 355, 354 327 are related to the mappings wiki/server Cheers, DImitris On Sat, Mar 7, 2015 at 12:29 PM, Guido Pio Mariotti guidopio.mariott...@gmail.com wrote: Hi, my name is Guido, I'm a student of Politecnico of Turin and actually I attend the first year of the master's degree in Computer Engineering. I'm interested in the topic 5.4 and 5.5, and I already have knowledge of Java and Javascript, also I'm going to take a PHP course in this semester, so I was thinking of start learning Scala. Do you have any suggestions in which bugs/features can I work on for the GSoC Warm-up that can be related to the two topic in which I'm interested? Hoping to collaborate with you very soon, even if not in the GSoC program, I wish you a nice week-end. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Kontokostas Dimitris -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
Re: [Dbpedia-gsoc] GSoC 2015 Introduction
Hi Thiago, Hi DBPedia-Team, thanks for your reply. I'd like to clarify a fundamental question: - In the previous GSoC the participant seem to have built his own goldstandard of mappings. Are standard benchmarks for quality measurement insufficient, i.e. does the schema matching quality vary and depend that much on the source schemas used? - In my opinion one could tackle this task either practically oriented by implementing a promising approach, or research oriented by working on an improvement of existing solutions. Which approach is likely the way to go? I have done quite some research and gone through a few papers and I think I have fair understanding. Are there any particular warm-up task related to this task that you could suggest? Best regards Robert Zitat von Thiago Galery tgal...@gmail.com: Hi Robert, I would advise taking a look at Marco's response to another prospective student. He points to these links for a summary of a similar project in 2014 -idea: http://wiki.dbpedia.org/gsoc2014/ideas#h359-11 -proposal: https://docs.google.com/document/d/16lAqKLAsAGQW0cp9SA0Egb1vlb6mPCcHYezVN-zB870/edit?pli=1 -stuff https://docs.google.com/document/d/16lAqKLAsAGQW0cp9SA0Egb1vlb6mPCcHYezVN-zB870/edit?pli=1-stuff done: https://github.com/dbpedia/extraction-framework/wiki/GSoC-2014-Progress-Sergey-Skovorodkin On Fri, Mar 6, 2015 at 12:04 PM, rlits...@mail.uni-mannheim.de wrote: Hello everybody, first off I'd like to introduce myself . I'm Robert, a current Masters student at the Mannheim University. I'm studying Business Informatics and pursuing the Data and Web Science Specialization Track. One of my major interests lies in Data Mining and I constantly complement my studies with Data Mining related online courses (MOOCs) during my free time. Alongside my studies I'm also employed as a student researcher at the Data and Web Science research group [1] under the supervision of Prof. Bizer. You will find many professors mentioned in many of the papers you suggest as a starting point. A major part of the research is particularly dedicated at Open Linked Data, hence the education is close-knit with examples and from research projects. Furthermore, during one of my previous internships I have been involed in building an Active Learning system for Named Entity Recognition which has also enhanced my experience within this field. The first time I got in touch with NLP and Machine Learning was during my Bachelor Thesis that concerned with the classification of Scientific Papers. Now coming to the GSoC project: My first priority would be to work on 5.7. Reverse Engineering and Aligning Freebase with DBpedia. I have a working knowledge of Sparql and the Freebase MQL query language if needed. During my prior semester I have used DBPedia and Freebase to perform web data integration in a closed domain. So I'm aware of schema integration and schema matching procedures, which I think qualifies me along with my programming experience fairly well. After digging into the proposal of the project there are some uncertainties that aroused. In the descriptin you mention the introduction of new properties and classes if needed. Your first reference [2] concerns mainly with the reduction/fusion of closely related or equivalent properties. - Can you give me an intuition of a situation where a need for a new class or property would arise? - Can you also please give an example of tools that are based on freebase and that should be easily migrated to DBpedia? - Speaking of the current approaches of mapping classes and properties, is there any work currently going on that deal with hierarchies of subjects and objects? - Related to [2], do S1 and O1 represent actual subjects and objects or rdf:type classes of S1 and O1? I think one problem could (at least partially) solve the other, namely using a trustful class mapping could assist in working out equivalent property mappings and vice versa. I would be available full-time during the time period of GSoC and it comes naturally for me that I get myself into the latest research prior the start of the GSoC period. - Can you please advise me what would be the next step? - The project mentioned above is only one of my interests given your proposals. Do I have to elaborate my interest to my second and third priority in a similar way? Best regads Robert [1] http://dws.informatik.uni-mannheim.de/en/home/ [2] http://wiki.knoesis.org/index.php/Property_Alignment -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now.
Re: [Dbpedia-gsoc] GSoC 2015 Introduction and Parallel processing in DBpedia extraction Framework
Yup, looking at the changelog of Apache Spark and having worked on upgrading much smaller applications across Spark versions, I can attest that this process shouldn't take too much time. The number of breaking changes are very minimal in recent versions. An idea I had, which I would like feedback on is having a configuration picker, rather than a list of preconfigured container/images. Kind of along the lines of Fedora's Revisor project [1]. You could mix and match depending on the configuration you want to use and a customized image/container is created for you. Of course, the feasibility of this is an open question... Honestly, if you ask me, this one project could probably be broken up into multiple projects, each with a different end goal. Docker brings in a very interesting set of things to play with, and it would be great if some of the mentors could provide more feedback on what the end goal of this specific GSoC project is. :) Thanks [1] http://revisor.fedoraunity.org/ Hi Xiao, and welcome! Some thoughts from my initial impression and I appreciate your feedback: ?- The project ?uses?? ?spark 0.9.1 while the latest version? of spark? is bumped to 1.2.1.? I suppose there will be some work on upgrade it to the new version.? It'll perhaps be good to port the code to Spark 1.2.1; I can't imagine it'll take too much work because the Spark API has been pretty stable since that. - It looks like the process is putting the data into HDFS, using spark the exact data and writing result back to HDFS. ?Are there any design document for this project? Yes, but it can also work without HDFS. On a single-node cluster you can write directly to the file system (I'm not sure if there is enough documentation on that, but there should be; it's mostly about substituting hdfs:///home/user/blah with file:///home/user/blah). On a multi-node cluster with NFS you can also work without HDFS. I have been meaning to write a proper paper on the project since a few months but never managed to get around to it. - Spark can works with various distributed file system (S3, GlusterFS, etc) not limited to HDFS. So I suppose this could be configurable. It'd be a good idea to make this configurable, and I suppose it fits in well with the docker containers idea too. Different kinds of configurations for EC2/S3, Google Cloud etc. Feel free to ask any other questions that you may have while running it. Cheers, Nilesh You can also email me at cont...@nileshc.com or visit my website http://nileshc.com/ On Thu, Mar 5, 2015 at 8:27 PM, Xiao Meng xiaom...@gmail.com wrote: Hi, My name is? Xiao, currently a PhD student in Simon Fraser University, Canada. ? A little background on myself: - My research is mainly on data management especially on NoSQL databases. - I worked for GSoC 2008 on PostgreSQL [1] when I was an undergraduate student:-) - ?Now ? I have been working on some open source projects for one year. ?They? include Apache Hive[2] and Apache Drill[3], both are SQL-on-Hadoop engines. I've ?also ? played ?Apache S? park for a while and have some hand-on experiences. ?I am learning scala and pretty like it.? - During the period ? of working on Hadoop ecosystem? , I gained experience on deploying clusters for dev and test. Docker is a great tool for this purpose and I have been building several complex docker containers [4]. I've heard the ?great DBpedia project long times ago and always want to play with it:-) Given my background, I am pretty interested in the following project: ? ? Parallel processing in DBpedia extraction Framework ?[5]?. Some thoughts from my initial impression and I appreciate your feedback: ?- The project ? uses? ? ? spark 0.9.1 while the latest version ? of spark? is bumped to 1.2.1. ? I suppose there will be some work on upgrade it to the new version. ? - I t looks like the process is putting the data into HDFS, using spark the exact data and writing result back to HDFS. ? Are there any design document for this project? - Spark can works with various distributed file system (S3, GlusterFS, etc) not limited to HDFS. So I suppose this could be configurable. ?I will try it out in following days. ? Any suggestions for evolving this project? ? ?Look forward to contributing to DBpedia! [1] https://wiki.postgresql.org/wiki/GSoC_2008 [2] https://github.com/xiaom/docker-drill [3] https://github.com/apache/hive [4] https://github.com/apache/drill [5] https://github.com/dbpedia/distributed-extraction-framework -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership
Re: [Dbpedia-gsoc] GSoC 2015 - Introduction
Hi Shashank, On DBpedia Spotlight – Better Context Vectors: Here are the DBPedia Spotlight warm tasks: https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Warm-up-tasks if you take a look at the github issue page you should find some of the problems we are dealing with. One of the ideas could be experimenting with word2vec. Have a nice weekend :) On Sat, Mar 7, 2015 at 11:46 AM, shashank juyal sjuyal...@gmail.com wrote: Hi, I am a Masters student in International Institute of Information technology, Hyderabad (IIIT-H). I am interested in taking part in this year's GSOC. Many of the projects in DBPedia sounds very familiar and interesting to me as I have worked closely with many of the concepts and technologies used in the project. I have worked previously with Wikipedia data and built a small search over it based on tf-idf score and my own parser. Also currently I am working in a project Question Answer techniques using NLP which uses concepts like wordtovec, CBOW, NL Processing and translation to query language, which are mentioned in some of the projects in DBPedia-Spotlight. Based on this, I would like to work on the following projects: 1) Fact Extraction from Wikipedia Text 2) Keyword Search on DBpedia 3) Deploying a DBpedia Question Answering Engine 4) DBpedia Spotlight – Better Context Vectors Please let me know the warm-up tasks in the above projects. Linked Profile: in.linkedin.com/in/shajuyal Github Profile: https://github.com/sjuyal Thanks and Regards, Shashank Juyal -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
Re: [Dbpedia-gsoc] GSoC 2015 Introduction and Parallel processing in DBpedia extraction Framework
Hi Xiao, and welcome! Some thoughts from my initial impression and I appreciate your feedback: - The project uses spark 0.9.1 while the latest version of spark is bumped to 1.2.1. I suppose there will be some work on upgrade it to the new version. It'll perhaps be good to port the code to Spark 1.2.1; I can't imagine it'll take too much work because the Spark API has been pretty stable since that. - It looks like the process is putting the data into HDFS, using spark the exact data and writing result back to HDFS. Are there any design document for this project? Yes, but it can also work without HDFS. On a single-node cluster you can write directly to the file system (I'm not sure if there is enough documentation on that, but there should be; it's mostly about substituting hdfs:///home/user/blah with file:///home/user/blah). On a multi-node cluster with NFS you can also work without HDFS. I have been meaning to write a proper paper on the project since a few months but never managed to get around to it. - Spark can works with various distributed file system (S3, GlusterFS, etc) not limited to HDFS. So I suppose this could be configurable. It'd be a good idea to make this configurable, and I suppose it fits in well with the docker containers idea too. Different kinds of configurations for EC2/S3, Google Cloud etc. Feel free to ask any other questions that you may have while running it. Cheers, Nilesh You can also email me at cont...@nileshc.com or visit my website http://nileshc.com/ On Thu, Mar 5, 2015 at 8:27 PM, Xiao Meng xiaom...@gmail.com wrote: Hi, My name is Xiao, currently a PhD student in Simon Fraser University, Canada. A little background on myself: - My research is mainly on data management especially on NoSQL databases. - I worked for GSoC 2008 on PostgreSQL [1] when I was an undergraduate student:-) - Now I have been working on some open source projects for one year. They include Apache Hive[2] and Apache Drill[3], both are SQL-on-Hadoop engines. I've also played Apache S park for a while and have some hand-on experiences. I am learning scala and pretty like it. - During the period of working on Hadoop ecosystem , I gained experience on deploying clusters for dev and test. Docker is a great tool for this purpose and I have been building several complex docker containers [4]. I've heard the great DBpedia project long times ago and always want to play with it:-) Given my background, I am pretty interested in the following project: Parallel processing in DBpedia extraction Framework [5]. Some thoughts from my initial impression and I appreciate your feedback: - The project uses spark 0.9.1 while the latest version of spark is bumped to 1.2.1. I suppose there will be some work on upgrade it to the new version. - I t looks like the process is putting the data into HDFS, using spark the exact data and writing result back to HDFS. Are there any design document for this project? - Spark can works with various distributed file system (S3, GlusterFS, etc) not limited to HDFS. So I suppose this could be configurable. I will try it out in following days. Any suggestions for evolving this project? Look forward to contributing to DBpedia! [1] https://wiki.postgresql.org/wiki/GSoC_2008 [2] https://github.com/xiaom/docker-drill [3] https://github.com/apache/hive [4] https://github.com/apache/drill [5] https://github.com/dbpedia/distributed-extraction-framework -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
[Dbpedia-gsoc] [GSOC 2015] Introduction and Help in starting the contribution
Hi Everybody, I am Nurendra Choudhary from International Institute of Information Technology, Hyderabad, India [1] and doing my major in Computational Linguistics. My interests lie in Natural Language Processing, Artificial Intelligence and Machine Learning. I like coding in Python, C, C++. Here's my SourceForge[2] and Github[3] profile. I normally go by the name akirato when doing projects or any coding. I went through the Ideas Page for GSOC 2015 and am really interested in the Fact Extraction from Wikipedia Text project. I have some ideas on the project. Like maybe, the first step could be to find the relation between verbs and the rest of the parts (something like theta roles, maybe) which further can be developed to finding relation between all pairs of words and so on. I have setup the development environment with Eclipse. Could you help me in proceeding further with the necessities required for the project? [1]http://iiit.ac.in/ [2]http://sourceforge.net/u/akirato/profile/ [3]https://github.com/Akirato/ Regards, Nurendra Choudhary -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc