subject:"\[GitHub\] jena pull request\: jena\-text multilingual indexing \(take 2\)"

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/52


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-21 Thread afs

Github user afs commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-104230097
  
The submitter can close it - I'll close it anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-21 Thread rvesse

Github user rvesse commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-104211075
  
Oops, my bad


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-21 Thread osma

Github user osma commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-104210829
  
@rvesse please look at pull request #64 instead, this one is obsolete I 
think, and should be closed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-21 Thread rvesse

Github user rvesse commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-104210822
  
One minor comment but otherwise LGTM

Pinging @Stephen-Allen and @ehedgehog who have previously developed a lot 
of this code for further comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-21 Thread rvesse

Github user rvesse commented on a diff in the pull request:

https://github.com/apache/jena/pull/52#discussion_r30788116
  
--- Diff: 
jena-text/src/main/java/org/apache/jena/query/text/TextIndexLuceneMultilingual.java
 ---
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.query.text;
+
+import org.apache.jena.graph.Node;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FSDirectory;
+import org.apache.lucene.store.RAMDirectory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.*;
+
+public class TextIndexLuceneMultilingual implements TextIndex {
+
+Hashtable indexes;
+private final EntityDefinition docDef;
+private final Directorydirectory ;
+
+public TextIndexLuceneMultilingual(Directory directory, 
EntityDefinition def) {
+this.directory = directory ;
+this.docDef = def;
+indexes = new Hashtable<>();
+
+//default index created first. Localized index will be created on 
the fly.
+TextIndex index = new TextIndexLucene(directory, def, null, null);
+indexes.put("default", index);
+}
+
+public Collection getIndexes() {
+return indexes.values();
+}
+
+TextIndex getIndex(String lang) {
+lang = LuceneUtil.getISO2Language(lang);
+if (lang == null)
+lang = "default";
+
+if (!indexes.containsKey(lang)) {
+//dynamic creation of localized index
+try {
+Analyzer analyzer = LuceneUtil.createAnalyzer(lang);
+if (analyzer != null) {
+Directory langDir;
+if (directory instanceof FSDirectory) {
+File dir = ((FSDirectory) 
directory).getDirectory();
+File indexDirLang = new File(dir, lang);
+langDir = FSDirectory.open(indexDirLang);
+}
+else
+langDir = new RAMDirectory();
+TextIndex index = new TextIndexLucene(langDir, docDef, 
analyzer, null);
+indexes.put(lang, index);
+}
+else
+lang = "default";
+} catch (IOException e) {
+e.printStackTrace();
+}
+}
+
+return indexes.get(lang);
+}
+
+@Override
+public void prepareCommit() {
+for (TextIndex index : indexes.values())
+index.prepareCommit();
+}
+
+@Override
+public void commit() {
+for (TextIndex index : indexes.values())
+index.commit();
+}
+
+@Override
+public void rollback() {
+for (TextIndex index : indexes.values())
+index.rollback();
+}
+
+@Override
+public void addEntity(Entity entity) {
+String lang = entity.getLanguage();
+getIndex(lang).addEntity(entity);
+}
+
+@Override
+public void updateEntity(Entity entity) {
+String lang = entity.getLanguage();
+getIndex(lang).updateEntity(entity);
+}
+
+@Override
+public Map get(String uri) {
+return null;
--- End diff --

Are these methods not implemented for a reason?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-13 Thread osma

Github user osma commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101707948
  
Good that you got it working! Yes, I think opening a new pull request makes 
sense - or just merge the work to this one, whatever is easiest for you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-12 Thread amiara514

Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101387550
  
Thanks a lot, I had not seen this method.
Well, it seems to work and all tests pass.
Should I propose the OneIndex branch in a new pull request ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-12 Thread osma

Github user osma commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101336857
  
Oh, right. Hmm.

How about this: TextIndexLucene currently calls IndexWriter.addDocument() 
and IndexWriter.updateDocument(). But there are versions of these methods that 
take an Analyzer parameter which overrides the default analyzer of the index. 
So if you could, within TextIndexLucene methods (or perhaps your own, similar 
class TextIndexLuceneMultilingual), determine the correct analyzer to use based 
on the Entity, then you could call addDocument and updateDocument with the 
right Analyzer parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-12 Thread amiara514

Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101331810
  
Hi, I'm not against your suggestion, it's probably easier to deal with one 
index.
So, I made some positive tests that cover the points discussed previously 
(index a lang var and query on it). 
But a problem persists, we need to dynamically set the indexing analyzer on 
each triple addition, each of them may have a different language. 
I dont think it's possible to change it on the fly. The indexWriter config 
is done at start and the lock mechanism prevents it...




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-12 Thread osma

Github user osma commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101232751
  
> But how to change correctly the existent code to target Lucene taking 
that extra language into account ?

You are currently already parsing the property function arguments in 
TextQueryPF.build() to switch to the the correct text index.

If you want to instead have a single index with an extra lang field like I 
suggested (note that this is just a suggestion - I think this would be cleaner 
than having many separate indexes, but you or others may of course disagree!), 
then you probably need to move that logic instead to the objectToStruct() 
method in the same class, which also has access to the property function 
arguments and already does a lot of parsing. objectToStruct() creates the query 
string for Lucene, so you should be able to add an extra parameter there (look 
at the query function which [adds a graph 
parameter](https://github.com/LICEF/jena/blob/upstream/jena-text-multilingual/jena-text/src/main/java/org/apache/jena/query/text/TextQueryPF.java#L218)
 for an example).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-11 Thread amiara514

Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101035985
  
> But since there is only a relatively small number of Lucene analyzers 
anyway, maybe this is OK.

It's why it's done like this :-)

>No, that wouldn't work. You have to use the same analyzer for both 
indexing and queries (in this case, the language-specific analyzer), otherwise 
the tokens won't match. 

Exactly

> But I think it should still be possible to share the same index, if you 
have a field that specifies the language and make sure to target your queries 
only to the specific language.

Store the language as an extra field is easy to do during the document 
creation (on the addEntity method). Add an extra param in queries is not a 
problem either (done in my solution).
But how to change correctly the existent code to target Lucene taking that 
extra language into account ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-11 Thread osma

Github user osma commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101017747
  
Great tests!

I wonder if there isn't a better method to convert 3 letter ISO 639 
language codes to the 2 letter equivalents. But since there is only a 
relatively small number of Lucene analyzers anyway, maybe this is OK.

> About the implementation, your proposal would use a StandardAnalyzer on 
indexing phase and a localized queryAnalyzer for queries ?

No, that wouldn't work. You have to use the same analyzer for both indexing 
and queries (in this case, the language-specific analyzer), otherwise the 
tokens won't match. 

But I think it should still be possible to share the same index, if you 
have a field that specifies the language and make sure to target your queries 
only to the specific language.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-11 Thread amiara514

Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-100941772
  
Some new tests have been submitted.

About the implementation, your proposal would use a StandardAnalyzer on 
indexing phase and a localized queryAnalyzer for queries ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-08 Thread osma

Github user osma commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-100212921
  
Configuring via the assembler is very good! And I see you have also added 
some tests. Excellent!

For the tests, I think you could add a few more test cases (especially 
TestDatasetWithLuceneMultilingualTextIndex seems incomplete). For example, 
checking that a query for the German word "Gift" (meaning poison) doesn't match 
the English word "gift", as it would with the standard analyzer. Also you could 
check that at least some of the language-specific stemming rules work with the 
multilingual index, just like you have already tested for book/books in 
TestDatasetWithLocalizedAnalyzer.

About the implementation of multilingual indexes: You have now made a 
separate Lucene index for each language. This is a valid solution, but have you 
considered instead using a single index with an extra field "lang" or 
"language" that would store the language of the indexed literal? Then at query 
time you could simply add an extra parameter to all queries restricting the 
lookup to a particular language. That might give simpler code and/or a more 
compact index in some cases. The analyzer would still be selected based on the 
language, of course.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-07 Thread amiara514

Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-99979270
  
Hi,
with the last proposal :
1) It's now possible to set multilingual indexing via assembler 
configuration file by defining the multilingual class and using it in the index 
definition :
```
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset  rdfs:subClassOf   ja:RDFDataset .
#text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
text:TextIndexLuceneMultilingual rdfs:subClassOf   text:TextIndex .

<#indexLucene> a text:TextIndexLuceneMultilingual ;
text:directory  ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
```
This multilingual index manages all localized literals automatically with 
all Lucene localized analyzers.

2) Moreover, with a default Lucene index setup, a localized analyzer can be 
specified (as for SimpleAnalyzer, KeywordAnalyzer, etc...) by this config :

```
<#indexLucene> a text:TextIndexLucene ;
text:directory  ;
text:entityMap <#entMap> ;
text:queryAnalyzer [
a text:LocalizedAnalyzer ;
text:language "en"
]
.
```

reference for JENA-928



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-01 Thread amiara514

Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-98183195
  
Hi Andy,
It's ok for the Jena3 new package format. The last commit already deals 
with it.

Ok, I'll write some tests soon..

For the documentation 2 questions to be sure:
1) Are we talking about this page : 
https://jena.apache.org/documentation/query/text-query.html ? 
2) Is there a special space to write it, or should I write a paragraph in 
this conversation ?

ps: I don't need a dediacated branch for the moment, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-01 Thread afs

Github user afs commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-98085046
  
Hi - this is looking good.

Merge conflicts arise because of the Jena3 migration.  Any packages 
"com.hp.hpl.jena" have become "org.apache.jena". Don't worry about that, we can 
fix that up easily enough.  I just checked the current state and replacing 
"com.hp.hpl.jena" by "org.apache.jena" resulted in a valid patch.

About tests and documentation - it's quite important to have these so that 
future changes in and around jena-text don't accidentally break things and so 
people can find the feature.  The documentation does not need to be large.

I've create [JENA-928](https://issues.apache.org/jira/browse/JENA-928) to 
track this.

@amiara514 Would it help if I created a branch in the codebase so that 
we're working against that as a reference point?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-04-29 Thread osma

Github user osma commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-97329735

> 1. I'm not familiar with assembler configuration. But if you want to give
some help ;-)

I'll try. I've done two jena-text patches in the past, and in both cases I
added support for assembler configuration.

For your patch I think it would be useful to be able to enable/disable
multilingual indexing in a particular jena-text index (default should be to
disable it, for backwards compatibility). Adjusting the particular
language-specific indexers, as I originally suggested, is not very important at
this point.

Thinking about assembler configuration, I think it would be easiest to plug
this in as an alternative to the current Analyzer variants (StandardAnalyzer,
SimpleAnalyzer, KeywordAnalyzer, LowerCaseKeywordAnalyzer). You can look at my
patch in [JENA-776](https://issues.apache.org/jira/browse/JENA-776) that added
the LowerCaseKeywordAnalyzer variant. Basically you need to create a new class
such as MultilingualAnalyzerAssembler (similar to the other *AnalyzerAssembler
classes) and plug support for it into TextAssembler. It's shouldn't be very
difficult...

> 2. Ok, I will refactor it to leave previous signatures and calls.
> 3. Sure, it's more clean to extend Entity... ok, todo list.

Excellent!

> For the tests and doc, I 'm pretty busy at the moment.

I can't speak for Jena officially as I'm just an occasional contributor
with an interest in jena-text, but Jena has very good unit test coverage and I
think unit tests are expected from new contributions as well. If you won't
write unit tests for this, I bet nobody else will... Again it's not very hard,
you can look at my LowerCaseKeywordAnalyzer patch for an example.

Regarding documentation, I think that what's needed is to update the main
jena-text document, particularly the [Configuring an
Analyzer](https://jena.apache.org/documentation/query/text-query.html#configuring-an-analyzer)
section. I'm not 100% sure how it is technically maintained these days, but it
used to be maintained via the CMS that [you can
use](http://www.apache.org/dev/cmsref#non-committer) to provide a documentation
patch. But I think it should be fine also to just provide an update as a
comment here on GitHub. Again see JENA-776 for an example, there I just wrote
up the small change to the documentation as a comment and @afs picked it up
from there.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-04-28 Thread amiara514

Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-97165961
  
Hi Osma, 
about your comments: 
1. I'm not familiar with assembler configuration. But if you want to give 
some help ;-)
2. Ok, I will refactor it to leave previous signatures and calls.
3. Sure, it's more clean to extend Entity... ok, todo list. 
For the tests and doc, I 'm pretty busy at the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-04-28 Thread osma

Github user osma commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-97152034
  
Hi Alexis!

> Hi Osma, it's now ok for merge.

Excellent!

> For the other part, I will propose soon the "triple deletion" which clean 
the related entry in the index.

OK this is good. I'm actually more interested in that part :)

> For the synchronization of jena-tdb and Lucene transactions, the current 
codebase seems to manage that.
> I'm not 100% sure yet.

Fine.

I took a new look at the code. Some comments/questions:

1. You seem to have added support for all(?) of the language-specific 
analyzers in Lucene. This is good, though even better would be to be able to 
override these in the configuration (similar to the way it is already possible 
to use a custom analyzer with jena-text).

2. You are changing the method signatures for some public methods, e.g. 
TextDatasetFactory.createLuceneIndex, TextDatasetFactory.createLucene, 
TextIndexLucene constructor. I'm not sure that this is a good idea as it might 
break other people's code that call those methods. I suggest that you add 
methods that support the old parameters, which can then call the extended 
variants. This way, you shouldn't need to change the existing tests, as you 
currently do.

3. This seems a bit suspicious:

+if (indexer instanceof TextIndexLuceneMultiLingual) {
+String lang = o.getLiteral().language();
+((TextIndexLuceneMultiLingual)indexer).addEntity(entity, 
lang);
+}
+else
+indexer.addEntity(entity) ;

Would it be possible to avoid the instanceof check and simply detect the 
language from within the entity? (possibly extending Entity instead, if it 
doesn't keep track of the language of the literal) I.e. this code would just be 
"indexer.addEntity(entity)" as it used to be and the magic to detect the 
language would be within TextIndexLuceneMultiLingual.addEntity(entity) where it 
arguably belongs...

4. I see that you have touched the existing unit tests (which I think may 
be a bad idea, see above) but you have not written unit test that specifically 
test the multilingual indexing. Would it be possible for you to add some of 
those? These would also serve as examples for the usage and expected behavior.

5. What about documentation?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-04-28 Thread amiara514

Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-97123004
  
Hi Osma, it's now ok for merge. 

For the other part, I will propose soon the "triple deletion" which clean 
the related entry in the index.

For the synchronization of lucene and tdb transactions, the current 
codebase seems to manage that. 
I'm not 100% sure yet. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-04-28 Thread osma

Github user osma commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-97003751
  
This pull request now requires resolving merge conflicts. I think it 
applied cleanly when it was initiated. Is this because of the Jena3 switch that 
is ongoing? @afs?

What happened to the other part from pull request #51, which was about 
synchronization?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-04-22 Thread amiara514

GitHub user amiara514 opened a pull request:

https://github.com/apache/jena/pull/52

jena-text multilingual indexing (take 2)

Hi,
This version allows usage of localized Lucene indexes (in jena-text).
All existing Lucene languages analyzers are taken into account.

2 new cases in TextDatasetFactory :
- createLuceneFromLanguage : creation of lucene index with the associated 
Lucene analyzer.
- createLuceneMultilingual : creation of a dynamic multilingual index 
(collection of localized lucene index) depending on triple's literals languages.


On SPARQL side, the pattern is :

?uri text:query (property 'query' ['lang:language']) ; query is dispatched 
to the right Lucene index. 

Note 1: If the 'lang' arg is not present, it's the same default existing 
case.
Note 2 : for the moment, the 'lang' argument is not managed with ?limit and 
?score variables but works if they are not present.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LICEF/jena upstream/jena-text-multilingual

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/52.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #52


commit 5e4e1c1432f44151356fe25cc44c87e0085c1873
Author: Alexis Miara 
Date:   2015-04-21T18:19:32Z

change on pom.xml to have local groupId

commit d3f21853c0d0556ad95ae06c393fb8a8619feb35
Author: Alexis Miara 
Date:   2015-04-22T18:55:58Z

Introducing Lucene multilingual index

commit abdc602fe505167562b7ce9218433bf7c99f2f9e
Author: Alexis Miara 
Date:   2015-04-21T18:19:32Z

change on pom.xml to have local groupId

commit a88b6e47a8ab0d595a1a7077f46fd8396ae3e89d
Author: Alexis Miara 
Date:   2015-04-22T18:55:58Z

Introducing Lucene multilingual index

commit ad87c035d841243dfc972d2b0e220f207ed5
Author: Alexis Miara 
Date:   2015-04-22T19:07:09Z

Merge branch 'upstream/jena-text-multilingual' of github.com:LICEF/jena 
into upstream/jena-text-multilingual

commit a125642e1f6bd8e9ec732784d897df6c4e7cd28c
Author: Alexis Miara 
Date:   2015-04-22T19:44:31Z

original pom.xml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

[GitHub] jena pull request: jena-text multilingual indexing (take 2)

24 matches

Site Navigation

Mail list logo

Footer information