Re: [Dbpedia-gsoc] GSoC 2015 Introduction and Parallel processing in DBpedia extraction Framework

2015-03-16 Thread Navin Pai
Hi Dimitris,
Thanks for clarifying the thought process behind the project. Correct me if
I'm wrong but what we're aiming for is to allow people to play with dbpedia
projects using a single straightforward 'docker run' or 'spark-submit'
command right?

Spark announced it's 1.3.0 release a couple of days ago[1]. I have a single
node cluster running Hadoop 2.6 and Spark 1.2.1. I'll upgrade my version of
Spark and try to port the code to the newer versions. I'm hoping there
won't be too many roadblocks. I'll keep the mailing list updated on how it
goes :)

[1] https://spark.apache.org/news/spark-1-3-0-released.html

On Tue, Mar 10, 2015 at 3:58 PM, Dimitris Kontokostas 
wrote:

> Hi Navin,
>
> On Sun, Mar 8, 2015 at 1:46 PM, Navin Pai  wrote:
>
>> Yup, looking at the changelog of Apache Spark and having worked on
>> upgrading much smaller applications across Spark versions, I can attest
>> that this process shouldn't take too much time. The number of breaking
>> changes are very minimal in recent versions.
>>
>
> Maybe this could be a warm up task for this project
>
>
>> An idea I had, which I would like feedback on is having a configuration
>> picker, rather than a list of preconfigured container/images. Kind of along
>> the lines of Fedora's Revisor project [1]. You could mix and match
>> depending on the configuration you want to use and a customized
>> image/container is created for you. Of course, the feasibility of this is
>> an open question...
>>
>
> This sounds like a good idea but I would put a lower priority in this and
> try it if there is time left at the end of the projet
>
>
>> Honestly, if you ask me, this one project could probably be broken up
>> into multiple projects, each with a different end goal. Docker brings in a
>> very interesting set of things to play with, and it would be great if some
>> of the mentors could provide more feedback on what the end goal of this
>> specific GSoC project is. :)
>>
>
> We are trying to bring DBpedia closer to industry related / big data
> projects and the preconfigured images or easy configurable scripts are a
> step towards industry adoption. So the idea is to give people tools to
> easily experiment with the code & data and see if they can invest more time
> to port it in their software stack.
> Another goal it to make it easy to run on a single-node cluster. Some
> preliminary results from Nilesh showed a big boost in extraction time even
> on a single machine due to better utilization of the HD so this could speed
> up our static releases.
>
> Best,
> Dimitris
>
>
>>
>> Thanks
>>
>> [1] http://revisor.fedoraunity.org/
>>
>>
>>
>> Hi Xiao, and welcome!
>>>
>>> Some thoughts from my initial impression and I appreciate your feedback:
>>> > ?- The project ?uses?? ?spark 0.9.1 while the latest version? of
>>> spark? is
>>> > bumped to 1.2.1.? I suppose there will be some work on upgrade it to
>>> the
>>> > new version.?
>>> >
>>>
>>> It'll perhaps be good to port the code to Spark 1.2.1; I can't imagine
>>> it'll take too much work because the Spark API has been pretty stable
>>> since
>>> that.
>>>
>>>
>>> > - It looks like the process is putting the data into HDFS, using spark
>>> the
>>> > exact data and writing result back to HDFS. ?Are there any design
>>> document
>>> > for this project?
>>> >
>>>
>>> Yes, but it can also work without HDFS. On a single-node cluster you can
>>> write directly
>>
>> to the file system (I'm not sure if there is enough
>>> documentation on that, but there should be; it's mostly about
>>> substituting
>>> hdfs:///home/user/blah with file:///home/user/blah). On a multi-node
>>> cluster with NFS you can also work without HDFS.
>>>
>>> I have been meaning to write a proper paper on the project since a few
>>> months but never managed to get around to it.
>>>
>>> - Spark can works with various distributed file system (S3, GlusterFS,
>>> etc)
>>> > not limited to HDFS. So I suppose this could be configurable.
>>>
>>>
>>> It'd be a good idea to make this configurable, and I suppose it fits in
>>> well with the docker containers idea too. Different kinds of
>>> configurations
>>> for EC2/S3, Google Cloud etc.
>>>
>>> Feel free to ask any other questions that you may have while running it.
>>>
>>> Cheers,
>>> Nilesh
>>>
>>> You can also email me at cont...@nileshc.com or visit my website
>>> 
>>>
>>>
>>> On Thu, Mar 5, 2015 at 8:27 PM, Xiao Meng  wrote:
>>>
>>> > Hi,
>>> >
>>> > My name is?
>>> >  Xiao, currently a PhD student in Simon Fraser University, Canada.
>>> > ?
>>> >
>>> >
>>> > A little background on myself:
>>> >
>>> > - My research is mainly on data management especially on NoSQL
>>> databases.
>>> > - I worked for GSoC 2008 on PostgreSQL [1] when I was an undergraduate
>>> > student:-)
>>> > -
>>> > ?Now ?
>>> > I have been working on some open source projects for one year.
>>> > ?They?
>>> >  include Apache Hive[2] and Apache Drill[3], both are SQL-on-Hadoop
>>> > engines. I've
>>> > ?also ?
>

Re: [Dbpedia-gsoc] GSoC 2015 Introduction and Parallel processing in DBpedia extraction Framework

2015-03-16 Thread Dimitris Kontokostas
On Mon, Mar 16, 2015 at 8:59 AM, Navin Pai  wrote:

> Hi Dimitris,
> Thanks for clarifying the thought process behind the project. Correct me
> if I'm wrong but what we're aiming for is to allow people to play with
> dbpedia projects using a single straightforward 'docker run' or
> 'spark-submit' command right?
>

I wish it could be that simple :) but the idea is to make it as
straightforward as possible


> Spark announced it's 1.3.0 release a couple of days ago[1]. I have a
> single node cluster running Hadoop 2.6 and Spark 1.2.1. I'll upgrade my
> version of Spark and try to port the code to the newer versions. I'm hoping
> there won't be too many roadblocks. I'll keep the mailing list updated on
> how it goes :)
>

The idea behind this task is to get you familiar with the code and help you
write a better application
We don't expect the upgrade to be a successful warm-up task, you could use
v1.2.1 as well or any other version.


Cheers,
Dimitris

>
> [1] https://spark.apache.org/news/spark-1-3-0-released.html
>
> On Tue, Mar 10, 2015 at 3:58 PM, Dimitris Kontokostas 
> wrote:
>
>> Hi Navin,
>>
>> On Sun, Mar 8, 2015 at 1:46 PM, Navin Pai  wrote:
>>
>>> Yup, looking at the changelog of Apache Spark and having worked on
>>> upgrading much smaller applications across Spark versions, I can attest
>>> that this process shouldn't take too much time. The number of breaking
>>> changes are very minimal in recent versions.
>>>
>>
>> Maybe this could be a warm up task for this project
>>
>>
>>> An idea I had, which I would like feedback on is having a configuration
>>> picker, rather than a list of preconfigured container/images. Kind of along
>>> the lines of Fedora's Revisor project [1]. You could mix and match
>>> depending on the configuration you want to use and a customized
>>> image/container is created for you. Of course, the feasibility of this is
>>> an open question...
>>>
>>
>> This sounds like a good idea but I would put a lower priority in this and
>> try it if there is time left at the end of the projet
>>
>>
>>> Honestly, if you ask me, this one project could probably be broken up
>>> into multiple projects, each with a different end goal. Docker brings in a
>>> very interesting set of things to play with, and it would be great if some
>>> of the mentors could provide more feedback on what the end goal of this
>>> specific GSoC project is. :)
>>>
>>
>> We are trying to bring DBpedia closer to industry related / big data
>> projects and the preconfigured images or easy configurable scripts are a
>> step towards industry adoption. So the idea is to give people tools to
>> easily experiment with the code & data and see if they can invest more time
>> to port it in their software stack.
>> Another goal it to make it easy to run on a single-node cluster. Some
>> preliminary results from Nilesh showed a big boost in extraction time even
>> on a single machine due to better utilization of the HD so this could speed
>> up our static releases.
>>
>> Best,
>> Dimitris
>>
>>
>>>
>>> Thanks
>>>
>>> [1] http://revisor.fedoraunity.org/
>>>
>>>
>>>
>>> Hi Xiao, and welcome!

 Some thoughts from my initial impression and I appreciate your feedback:
 > ?- The project ?uses?? ?spark 0.9.1 while the latest version? of
 spark? is
 > bumped to 1.2.1.? I suppose there will be some work on upgrade it to
 the
 > new version.?
 >

 It'll perhaps be good to port the code to Spark 1.2.1; I can't imagine
 it'll take too much work because the Spark API has been pretty stable
 since
 that.


 > - It looks like the process is putting the data into HDFS, using
 spark the
 > exact data and writing result back to HDFS. ?Are there any design
 document
 > for this project?
 >

 Yes, but it can also work without HDFS. On a single-node cluster you can
 write directly
>>>
>>> to the file system (I'm not sure if there is enough
 documentation on that, but there should be; it's mostly about
 substituting
 hdfs:///home/user/blah with file:///home/user/blah). On a multi-node
 cluster with NFS you can also work without HDFS.

 I have been meaning to write a proper paper on the project since a few
 months but never managed to get around to it.

 - Spark can works with various distributed file system (S3, GlusterFS,
 etc)
 > not limited to HDFS. So I suppose this could be configurable.


 It'd be a good idea to make this configurable, and I suppose it fits in
 well with the docker containers idea too. Different kinds of
 configurations
 for EC2/S3, Google Cloud etc.

 Feel free to ask any other questions that you may have while running it.

 Cheers,
 Nilesh

 You can also email me at cont...@nileshc.com or visit my website
 


 On Thu, Mar 5, 2015 at 8:27 PM, Xiao Meng  wrote:

 > Hi,
 >
 > My name is?

Re: [Dbpedia-gsoc] GSOC Introduction

2015-03-16 Thread Marco Fossati
Hi Ankush,

On 3/14/15 10:12 PM, Ankush Jindal wrote:
> Hi Mentors,
>
> I read about the fact extractor project from the idea page, and I would
> like to work on it.
Sounds ggod.
The project has received quite a lot of interest, so please read first 
all the ongoing discussions in this mailing list.
> I am Ankush Jindal, pursuing bachelors in Computer Science from IIT
> Mandi. I have worked this winters on Natural Language Processing
> (sentiment analysis and feature extraction for Hotel reviews) and I feel
> that I would be fit for this project.
> I have a little hesitation that I am a little late in discussing the
> project and I am really sorry for it. However, I make sure that I will
> work on the warm-up tasks and go through the code as soon as I could. If
> you could advise me some particular warm-up tasks or some other exercise
> that you require from me, given the time-frame, I would be more than
> happy to do so.
Everything is here:
https://github.com/dbpedia/fact-extractor/issues
Since lots of people took issue #1, I would suggest to focus on the 
other ones, especially those that have a higher difficulty (check out 
the tags).

Cheers!
>
> //
> /Ankush Jindal/
> /Student, IIT Mandi, India
> Phone: +91-9805901195/
> /Facebook: @/jindalankush95 
> Github: @travis-bickle 
>
>
> --
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
>
> ___
> Dbpedia-gsoc mailing list
> Dbpedia-gsoc@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>

-- 
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSOC newbie

2015-03-16 Thread Marco Fossati
Hi Amitrajit,

On 3/14/15 8:12 PM, Amitrajit Sarkar wrote:
> hi Dimitris,
>
> thanks for the tip. in the meanwhile I shall read up and keep working on
> the warm up tasks, then..
>
> cheers,
> Amitrajit
>
> On Sat, Mar 14, 2015 at 8:09 PM, Dimitris Kontokostas  > wrote:
>
> Hi Amitrajit & welcome,
>
> Until the application period we keep all conversations public and we
> discuss any question about the project in this mailing list.
> There is already some discussion on this project if you search the
> ml archives and you are welcome to ask more detailed questions.
> Marco, the mentor for this project will happily guide you
>
> Cheers,
> Dimitris
>
> On Sat, Mar 14, 2015 at 4:22 PM, Amitrajit Sarkar
> mailto:aaiijm...@gmail.com>> wrote:
>
> hi..
>
> my name is Amitrajit. I am a CS undergraduate student from
> Jadavpur University, India. Im fluent in C, C++, Java, Python,
> and have been working on Natural Language Processing and
> Artificial Intelligence for a while now. but this is my first
> time applying for Google Summer of Code. the ideas: 'fact
> extraction from Wikipedia text' and 'reverse engineering and
> aligning Freebase with DBpedia' caught my attention. I dropped
> an email yesterday introducing myself. since then, I went ahead
> and tried out one of the warmup tasks on dbpedia/fact-extractor.
> Ive issued a pull request on GitHub..
Saw that, thanks!
Check out my comments directly on the pull request conversation.
>
> to the best of my understanding (which may not be much), fact
> extraction would be an unsupervised (or semisupervised)
> dependency parsed pattern interpretation,
Nope, it will be fully supervised, eventually backed by distant 
supervision, as another potential candidate has interestingly pointed out.
Dependency parsing is currently not needed, since it's more a matter of 
chunking/entity linking.
> whereas database
> alignment would be a matter of finding and linking (and
> sometimes creating) common vertices and edges on the knowledge
> graph. but I was hoping Id be able to talk to someone about the
> projects..
>
> any help would be welcome. thank you..
>
> 
> --
> Dive into the World of Parallel Programming The Go Parallel
> Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is
> your hub for all
> things parallel software development, from weekly thought
> leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and
> join the
> conversation now. http://goparallel.sourceforge.net/
> ___
> Dbpedia-gsoc mailing list
> Dbpedia-gsoc@lists.sourceforge.net
> 
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>
>
>
> --
> Kontokostas Dimitris
>
>
>
>
> --
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
>
> ___
> Dbpedia-gsoc mailing list
> Dbpedia-gsoc@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>

-- 
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC 2015 questions

2015-03-16 Thread Axel Ngonga

Hello Daniel,

Great to meet you! And sorry for the delay but this week has been rather 
busy. For the keyword search and QA, please first read the papers 
pointed out in the description of the keyword search task [1] 
(especially SESSA, which is the key paper for this task) to get an idea 
of existing solutions. The warmup task for keyword search would consist 
of implementing a module for loading a fact-based representation of 
DBpedia into a database (could be a graph database or a triple store. I 
think a graph database world work best due to the spreading activation 
paradigm that we aim to use.). We will then aim to implement an 
efficient spreading activation based on this representation. This step 
is necessary to implement the keyword-based search approach envisioned. 
Thereafter, we can have a look at the dictionaries necessary to ground 
the entries of the users. For question answering, the warmup task would 
consists of creating a lookup solution for classes, properties and 
resources.


Best regards,
Axel

[1] http://wiki.dbpedia.org/gsoc2015/ideas#h460-5

Hello everyone.
My name is Daniel, I'm currently a third year student pursuing a 
Bachelor of Applied Math degree in Odessa National University in Ukraine.
I'm interested in contributing to DBPedia project, because I believe 
that pushing the boundaries of mankind's knowledge is important, 
and artificial intelligence in general and knowledge bases in 
particular are among the most promising fields of science.
I have 2.5 year programming experience in general and 1 year experince 
in particular in Java programming (Android apps and high-load web 
services  development). I want to improve dbpedia's QA engine, but I 
can also work on search engine or statistics, error reporting tools. 
How can I learn in detail about this tasks (requirements, 
desirable outcome etc.)

Thanks in advance,
Daniel.


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc



--
Axel Ngonga, Dr. rer. nat
Head of AKSW
Augustusplatz 10
Room P905
04109 Leipzig
http://aksw.org/AxelNgonga

Tel: +49 (0)341 9732341
Fax: +49 (0)341 9732239

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC 2015

2015-03-16 Thread David Przybilla
Hi Olek,



On Sat, Mar 14, 2015 at 10:50 PM, Oleksandr Olgashko <
alexandrolg...@gmail.com> wrote:

> So far, I have done warm-up tasks, and now going to dig source code.
> Could you please review my thoughts about 5.19 task?
>
> (i) Is it interesting to have a unifying value for the selected candidate?
> How would you combine the values from the filters that are already in place
> ?
> If I do not miss anything, the route is as follows: 1) create some
> annotated set of entities 2) design combining function for those three
> values, e.g. mean 3) play with function and coefficients, to find best
> suitable.
>

So yes, the current pipeline is:

  1. Get some surface forms
  2. Match those surface forms into Candidate Topics
  3. Get the contexts of the candidate topics
  4. Use a disambiguation function to calculate some scores ( FinalScore,
SecondPercetangeRank)
 5. Filter Topics below the certain thresholds

There are some tools for finding the best set of parameters(confidence,
support..etc) for a given set of annotated data. i.e:
https://github.com/diegoceccarelli/dexter-eval

In our case we have seen some dodgy quality of the vectors used during
disambiguation, which makes it a bit hard regardless of how good is the
function you could design.
There could be other methods of disambiguation which do not rely
necessarily on context vectors or that use them with other information i.e:
Graph Information..


>
> (ii) can the notion of entity relevance be equated with that of confidence
> ?
> In general, no, that depends on how both are calculated. However, in case
> of entity recognition, relevance of guess is derived from the features
> (e.g. if word ends with "-er", that gives several points in favor that we
> are talking about profession) + algorithm for context, so these concepts
> are same.
> One of possible ways (from specific to DBpedia) to increase the precision
> of algorithm is to find "number of transitions in Wikipedia" between words
> in context. Am I thinking in right direction?
>

I agree that confidence != Relevance.
Im not sure what you mean with :
""" "number of transitions in Wikipedia" between words in context. """
Do you mean distance between Topics in the DBpedia Graph ?


>
>

> By the way, if I choose in online demo `Confidence` -> 0, select `n-best`
> and press `Annotate`, what the numbers in dropdown list means? For example,
> for word `First` first two are World War I (1.00) and Football League First
> Division (1.45e-7)
>
> This corresponds to a score named `finalScore` it is based on the context
vectors and a value called `percentageOfSecondrank` which estimates  the
percentage of the finalScore of the next-best entity compared to the
finalScore of the current.

If you hit the candidates endpoint you can get all of these scores. here is
an example:

http://spotlight.sztaki.hu:/rest/candidates?confidence=0.0&text=First%20documented%20in%20the%2013th%20century,%20Berlin%20was%20the%20capital%20of%20the%20Kingdom%20of%20Prussia%20(1701%E2%80%931918),%20the%20German%20Empire%20(1871%E2%80%931918),%20the%20Weimar%20Republic%20(1919%E2%80%9333)%20and%20the%20Third%20Reich%20(1933%E2%80%9345).%20Berlin%20in%20the%201920s%20was%20the%20third%20largest%20municipality%20in%20the%20world.%20After%20World%20War%20II,%20the%20city%20became%20divided%20into%20East%20Berlin%20--%20the%20capital%20of%20East%20Germany%20--%20and%20West%20Berlin,%20a%20West%20German%20exclave%20surrounded%20by%20the%20Berlin%20Wall%20from%201961%E2%80%9389.%20Following%20German%20reunification%20in%201990,%20the%20city%20regained%20its%20status%20as%20the%20capital%20of%20Germany,%20hosting%20147%20foreign%20embassies
.


>
> 2015-03-09 14:52 GMT+02:00 Oleksandr Olgashko :
>
>> Found warm-up tasks for DBpedia Spotlight, sorry for inconvenience
>>
>> 2015-03-09 13:06 GMT+02:00 Oleksandr Olgashko :
>>
>>> Thanks for answers,
>>>
>>> On previous project I was working on several named entity recognition
>>> classifiers (naive Bayes and conditional random field based, we used
>>> Ontonotes corpus data), also I have brief experience with Apache Spark.
>>> So, probably, 5.16 and 5.17 would be most suitable for me, and 5.14 is
>>> worth to think about.
>>> Could you please give some warm-up tasks for these ideas?
>>> Also, is it possible to use Stanford NLP (GPL license?)
>>>
>>> 2015-03-09 12:42 GMT+02:00 David Przybilla :
>>>
 Hi Oleksandr,

 5.16, 5.17 both involve Scala + A bit of Natural Language Processing.
 5.17 is more about being able to massage a wikipedia dump and getting
 numbers out of it for Name entity recognition.



 On Mon, Mar 9, 2015 at 9:27 AM, Dimitris Kontokostas >>> > wrote:

> Hi Oleksandr & welcome
>
> I'd suggest you narrow down your topics to very few 1-2 in order to be
> able to better focus on your final proposal.
> Let us know if you have any questions
>
> Cheers,
> DImitris
>
> On Sun, Mar 8, 2015 at 11:59 P

Re: [Dbpedia-gsoc] JSONpedia Warmup tasks

2015-03-16 Thread Michele Mostarda
Hi Navin,

On 16 March 2015 at 07:35, Navin Pai  wrote:

> Hey Michele,
>
> Sorry about the late reply, I was on break over the last week... I just
> had a look at the issue list. I'll get started with one of them soon. I
> have a few doubts, but I'll probably add them on the issues itself rather
> than on the mailing list.
>
No rush, feel free to write me or ping on skype if you go stuck.

Best
Michele

>
> Thanks
>
> On Tue, Mar 10, 2015 at 9:43 PM, Michele Mostarda <
> michele.mosta...@gmail.com> wrote:
>
>> Hi Navin,
>>   just to clarify: we have two repositories:
>>
>> R1) the official JSONpedia project repository [1]: contains the JSONpedia
>> library source code, issue tracker and documentation.
>>
>> R2) the jsonpedia-extractor subproject of dbpedia [3]: contains specific
>> code related to the integration of JSONpedia in DBpedia.
>> In this repo (that I'm going to cleanup) it is already present some
>> prototipation code developed last year (GSoC2014) which has been merged
>> with the official JSONpedia code base, please don't care too much about it.
>> What we are going to use of [3] during this warmup stage is just the
>> issue tracker where we are asked to track any activity done on JSONpedia in
>> relation with GSoC.
>>
>>I've updated the official JSONpedia issue tracker [2]. In order of
>> complexity you could deal with one of the following issues:
>> #11 #8 #10 #7 #9 #13.
>>
>> Thanks for supporting us.
>>
>> Best
>> Michele
>>
>> [1] https://bitbucket.org/hardest/jsonpedia/
>> [2] https://bitbucket.org/hardest/jsonpedia/issues?status=new&status=open
>> [3] https://github.com/dbpedia/jsonpedia-extractor
>>
>> On 10 March 2015 at 12:15, Navin Pai  wrote:
>>
>>> Hey,
>>> That sounds great, will keep an eye on the list of open issues.
>>>
>>> Also, if you find any issue that you think would be a good one for me to
>>> tackle as a start, do let me know. Pointers would definitely help. :)
>>>
>>> Thanks
>>> Navin
>>>
>>> On Tue, Mar 10, 2015 at 3:32 PM, Michele Mostarda <
>>> michele.mosta...@gmail.com> wrote:
>>>
 Hi Navin,

 On 9 March 2015 at 20:49, Navin Pai  wrote:

> Hey Michele,
>
> I managed to get it up and running by simply using the jersey jars in
> the classpath manually [1]. Didn't have to include any of the other jars.
>
 Yes you're right, with the fat jar (jar of jars) you just need to
 specify the list of dependencies which are included inside.


> The Maven shade plugin was a repeated suggestion for the "jar of jars"
> problem. Will check out the commit link you've provided.
>
 The clean solution I introduced yesterday uses the shade plugin, it
 does not produce a jar of jars any longer but just an unpacked classpath,
 it also takes care to merge the jar metadata avoiding wrong ovverides,
 which was the original problem.


> I seem to be up and running with both Mongodb as well as
> ElasticSearch! :) I'm just checking out the API mentioned on
> http://localhost:PORT/frontend/store.html to get a better feel of it.
>

 The screenshot looks correct, very good job.

 During the day I'm going to publish all the open issues on the project
 issue tracker.
 Some of them are really complex and require deep knowledge of the
 project and a technical documentation that not yet exists, other are self
 contained.
 If you are interested in working on some of them please let me know, I
 will provide all the support to make you effective.

>
> Thanks
>

 Thanks a lot.
 Best
 Michele

>
>
> [1] http://i.imgur.com/1IH7Kl7.png
>
>
>>
>>
>> --
>> Michele Mostarda
>> Senior Software Engineer
>> skype: michele.mostarda
>> twitter: micmos
>> mail: m...@michelemostarda.com
>> site: http://michelemostarda.it
>>
>
>


-- 
Michele Mostarda
Senior Software Engineer
skype: michele.mostarda
twitter: micmos
mail: m...@michelemostarda.com
site: http://michelemostarda.it
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


[Dbpedia-gsoc] GSoC '15 - Interest in 5.14 (Scalable querying of the live DBpedia data stream)

2015-03-16 Thread Pablo Estrada
Hello everyone,
my name is Pablo, from Mexico. I am a master's student of scientific
computing at Seoul National University, in South Korea. I am interested in
project 5.14, relating to scalable querying of the DBpedia data stream. I
have previous experience interning in Google's knowledge engine team, and I
have some understanding of the considerations in the querying of knowledge
bases. I feel I have a good background to tackle this project this GSoC.
Also, I think it's a very desirable project for DBpedia - Querying of live
updates to DBpedia would bring benefits such as speed of development for
app developers using DBpedia; as well as the benefit of having access to
fresher data.
I will read through the references in the next couple of days, and exchange
some ideas in this email chain.
Also, I'll start taking a look at the warm up tasks : )

Any advice, or extra references would be appreciated.
Thanks all

Pablo
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


[Dbpedia-gsoc] GSOC Introduction

2015-03-16 Thread Abhishek Tiwari
Hi all,

My name is Abhishek Tiwari. I am a fourth year undergraduate student at
IIT(BHU),Varanasi. I have been working on my semester project
"Identification of causal relation in natural language text with the help
of graph patterns". This project gave me experience of handling Stanford
parser(for chunking and obtaining parse tree format) and  SenseLearner(word
sense disambiguation).
Also I had learnt wide number of python libraries such as lxml, nltk ,
multiprocessing, networkx(for graph representation) and graph-tool. I also
had to use streaming API in Hadoop while writing  mapreduce in python in
order to manage large number of computations.

Currently I have been trying the warmup tasks listed for 5.1 Fact
extraction from wikipedia text.
 Although I am also interested in NLP topics by dbpedia-spotlight:
5.15 Better Context Vectors
5.16 Better Surface Form Matching
5.19 Confidence/Relevance Scores

I am going to try warmup task for the topics. Please guide as how to best
understand the above topics.

Regards,
Abhishek Tiwari
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSOC Introduction

2015-03-16 Thread Kartik Singhal
Hi Axel

Should we setup a time for skype chat. This way I will be online at that
particular time and we will be able discuss my queries and also the warm up
tasks.

regards,
Kartik

On Sun, Mar 15, 2015 at 4:27 PM, Kartik Singhal  wrote:

> Hi Axel
>
> Would you want to send me your skype contacts?
> Yes sure. My skyper contact name is kartik.kk.singhal. You can
> also find me through my mail id i.e.
> gkkart...@gmail.com. We can also do it through hangouts.
>
> Which one would you want to go for?
> Both the problems are very interesting and new to me. I am
> little bit inclined towards color spreading activation
> since I was able to understand it better. But If you prefer on
> working on second problem than I have absolutely   no problem
> at all.
>
> regards,
> Kartik
>
> On Sun, Mar 15, 2015 at 2:42 PM, Axel Ngonga <
> ngo...@informatik.uni-leipzig.de> wrote:
>
>>  Hi Kartik,
>>
>> Hi Axel
>>
>>  First of all sorry for replying late, I was busy with my real brother's
>> marriage and so was not able to reply.
>>
>> No problem at all.
>>
>>
>>  I was not able to get clear idea of section 2.2 in [2]
>> In this, during query construction, how initial and second triple
>> patterns are formed from the identified resources from previous steps.
>>
>> It might be easier to discuss this via Skype. Would you want to send me
>> your skype contacts?
>>
>>
>>  Also while reading these papers, I got a clearer image of how you want
>> to approach this topic.
>> First we will generate SPARQL queries from the user defined queries using
>> the resource disambiguation process and query construction as specified in
>> [2].
>> Then we will verbalize them using SPARQL2NL, and then we will apply Color
>> spreading activation algorithm specified in [1] to get the answer to the
>> queries.
>> I may be wrong as I am just trying to understand the idea.
>>
>> Well it depends on the problem that we are trying to solve. For the
>> keyword search, we would implement an improved version of the spreading
>> activation algorithm in [1] and aim to make it as time-efficient as
>> possible. This will include transforming RDF into a fact based
>> representation and including the corresponding lookup mechanisms. For the
>> question answering, we would (1) generate SPARQL queries, (2) generate
>> results and (3) verbalize the queries and their results. Which one would
>> you want to go for?
>>
>>
>>  I am currently understanding SPARQL2NL, you can now give me some warm
>> up tasks which will make me fluent in above process and will help me in
>> getting started.
>>
>>  regards,
>> Kartik
>>
>> Best,
>> Axel
>>
>>
>>  [1] http://goo.gl/dPbP3F
>>  [2] http://dl.acm.org/citation.cfm?id=2488488
>>
>> On Mon, Mar 9, 2015 at 1:20 PM, Axel Ngonga <
>> ngo...@informatik.uni-leipzig.de> wrote:
>>
>>>  Hello Kartik,
>>>
>>> I'm delighted that you are interested in the search topic. Please read
>>> * http://goo.gl/dPbP3F
>>> * http://dl.acm.org/citation.cfm?id=2488488
>>>
>>> Feel free to contact me if you have further questions or for a warm-up
>>> task.
>>>
>>> Best regards,
>>> Axel
>>>
>>>  Hi everyone,
>>>
>>>  I am Kartik, a fourth year student in Computer Science from India. I
>>> am comfortable working in Java/Python/C/C++ and Matlab.
>>>
>>>  I have been working in field of Natural Language Processing and
>>> Computational Linguistics particularly in Keyword extraction and sentiment
>>> analysis  for almost two years and have been using many DBpedia's NLP
>>> applications in my projects like AlchemyAPI and dataTXT semantic text API
>>> for past one year. Currently I am working on DBpedia dumps to create local
>>> search engine.
>>>
>>>  I would love to contribute to the project idea 5.9 (Keyword Search on
>>> DBpedia) since my familiarity with the topic. Although
>>> I don't have much idea about SPARQL queries. But I am inclined to learn
>>> new things and the project seems to be quite interesting.
>>> If the mentor can provide me some initial insights, so that I can know
>>> where to start, that would be really great.
>>>
>>>  Thanks and regards,
>>> Kartik SInghal
>>> --
>>>  Regards,
>>> Kartik Singhal
>>> Final Year Undergraduate
>>> Computer Science and Engineering
>>> The LNM Institute of Information Technology, Jaipur
>>> Email: gkkart...@gmail.com Mobile: +919530375881
>>>
>>>
>>>  
>>> --
>>> Dive into the World of Parallel Programming The Go Parallel Website, 
>>> sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub for 
>>> all
>>> things parallel software development, from weekly thought leadership blogs 
>>> to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now. http://goparallel.sourceforge.net/
>>>
>>>
>>>
>>> ___
>>> Dbpedia-gsoc mailing 
>>> listDbpedia-gsoc@lists.sourcefo

Re: [Dbpedia-gsoc] GSOC Introduction

2015-03-16 Thread Marco Fossati
Hi Abhishek,

We are already working on your pull request, thanks!
Feel free to share any thoughts on this mailing list (except those specific
to the repo code).
Cheers!

On 16 March 2015 at 14:28, Abhishek Tiwari  wrote:

> Hi all,
>
> My name is Abhishek Tiwari. I am a fourth year undergraduate student at
> IIT(BHU),Varanasi. I have been working on my semester project
> "Identification of causal relation in natural language text with the help
> of graph patterns". This project gave me experience of handling Stanford
> parser(for chunking and obtaining parse tree format) and  SenseLearner(word
> sense disambiguation).
> Also I had learnt wide number of python libraries such as lxml, nltk ,
> multiprocessing, networkx(for graph representation) and graph-tool. I also
> had to use streaming API in Hadoop while writing  mapreduce in python in
> order to manage large number of computations.
>
> Currently I have been trying the warmup tasks listed for 5.1 Fact
> extraction from wikipedia text.
>  Although I am also interested in NLP topics by dbpedia-spotlight:
> 5.15 Better Context Vectors
> 5.16 Better Surface Form Matching
> 5.19 Confidence/Relevance Scores
>
> I am going to try warmup task for the topics. Please guide as how to best
> understand the above topics.
>
> Regards,
> Abhishek Tiwari
>
>
> --
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> ___
> Dbpedia-gsoc mailing list
> Dbpedia-gsoc@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>


-- 
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] GSoC '15 - Interest in 5.14 (Scalable querying of the live DBpedia data stream)

2015-03-16 Thread Ruben Verborgh
Hi Pablo,

> I am interested in project 5.14, relating to scalable querying of the DBpedia 
> data stream.


Great to hear! I can tell you it's a very cool project :-)

> Any advice, or extra references would be appreciated.

I definitely recommend trying out the server 
(http://fragments.dbpedia.org/2014/en)
and client (http://client.linkeddatafragments.org/).
Our main publication on this topic should give many insights:
http://linkeddatafragments.org/publications/iswc2014.pdf.

Here are some things you can try to warm up:
– Use the interface from the command line (for instance, using curl): 
http://fragments.dbpedia.org/2014/en.
– Retrieve responses in various content types through content negotiation. The 
server currently supports HTML, JSON(-LD), Turtle, TriG, N-Triples, N-Quads.
– Parse one or more responses and try to understand their differences.
– Set up a local server using a dataset of your choice. (Many datasets can be 
found here:http://lodlaundromat.org/wardrobe/.)
– Try to set up a local server with DBpedia 2014 or DBpedia live.

If you have any questions, just mail us!

Best,

Ruben
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


Re: [Dbpedia-gsoc] Newer to comtribute

2015-03-16 Thread 黄章帅
Hi all,

I am Hzshuai.
I'm from EECS of Peking University, Beijing, China. My research field
includes NLP and machine learning.

I have experience in information extraction, machine translation (from
Chinese to English or reverse )
when I was an intern in a start-up in Beijing for 6 month.
I am familiar with C/C++, python, java. But I'm a beginner in Scala, which
(as far as I know ) is a cool functional language.

I know DBpedia from the GSoC2015.  After browsing this project, I realize
it is great choice for me to get involved with
Open Source Community. Now,  I'm eager to make some contribution to DBpedia
codebase, not only to practise the knowledge
I have learned, but also to dirty my hands and solve real challenging
problems.

These days, I am trying to figure out the tasks in this projects, like
ideas for GSoC2015, programming language and environment.
Since I am in my first year of MS,  I can spend a lot of time on enjoying
open-source. So I'd like to participate a long-term task.
Any help or  easy-to-hard directions will be highly appreciated.

Thanks.

2015-03-16 22:47 GMT+08:00 黄章帅 :

> Hi all,
>
> I am Hzshuai.
> I'm from EECS of Peking University, Beijing, China. My research field
> includes NLP and machine learning.
>
> I have experience in information extraction, machine translation (from
> Chinese to English or reverse )
> when I was an intern in a start-up in Beijing for 6 month.
> I am familiar with C/C++, python, java. But I'm a beginner in Scala, which
> (as far as I know ) is a cool functional language.
>
> I know DBpedia from the GSoC2015.  After browsing this project, I realize
> it is great choice for me to get involved with
> Open Source Community. Now,  I'm eager to make some contribution to
> DBpedia codebase, not only to practise the knowledge
> I have learned, but also to dirty my hands and solve real challenging
> problems.
>
> These days, I am trying to figure out the tasks in this projects, like
> ideas for GSoC2015, programming language and environment.
> Since I am in my first year of MS,  I can spend a lot of time on enjoying
> open-source. So I'd like to participate a long-term task.
> Any help or  easy-to-hard directions will be highly appreciated.
>
> Thanks.
>
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


[Dbpedia-gsoc] Student application period open

2015-03-16 Thread Dimitris Kontokostas
Dear students,

The application period just started and you are all welcome to start
submitting your applications.
>From our experience it's better to focus on one idea, submit early and ask
for feedback from the mentors in order to improve until the deadline. (Of
course there are exceptions and you can focus in more ideas if you want).

In general the feedback will come from mentors directly from the melange
system as comments to your application but you can still use this mailing
list for general questions.

Best luck to all of you and I am sure we'll see a lot of amazing
application like we did the last years ;)

Cheers,
Dimitris

-- 
Kontokostas Dimitris
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc