[jira] [Commented] (JENA-1305) Elastic Search Support for Apache Jena Text

ASF GitHub Bot (JIRA) Thu, 16 Mar 2017 05:42:14 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927937#comment-15927937
 ]


ASF GitHub Bot commented on JENA-1305:
--------------------------------------

Github user osma commented on a diff in the pull request:

    https://github.com/apache/jena/pull/227#discussion_r106403179
  
    --- Diff: 
jena-text/src/main/java/org/apache/jena/query/text/TextIndexES.java ---
    @@ -0,0 +1,394 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.query.text;
    +
    +import org.apache.jena.graph.Node;
    +import org.apache.jena.graph.NodeFactory;
    +import org.apache.jena.sparql.util.NodeFactoryExtra;
    +import 
org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsRequest;
    +import 
org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
    +import org.elasticsearch.action.get.GetResponse;
    +import org.elasticsearch.action.index.IndexRequest;
    +import org.elasticsearch.action.search.SearchResponse;
    +import org.elasticsearch.action.update.UpdateRequest;
    +import org.elasticsearch.action.update.UpdateResponse;
    +import org.elasticsearch.client.Client;
    +import org.elasticsearch.client.transport.TransportClient;
    +import org.elasticsearch.common.settings.Settings;
    +import org.elasticsearch.common.transport.InetSocketTransportAddress;
    +import org.elasticsearch.common.xcontent.XContentBuilder;
    +import org.elasticsearch.index.query.QueryBuilders;
    +import org.elasticsearch.script.Script;
    +import org.elasticsearch.search.SearchHit;
    +import org.elasticsearch.transport.client.PreBuiltTransportClient;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import java.net.InetAddress;
    +import java.util.*;
    +
    +import static 
org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
    +
    +/**
    + * Elastic Search Implementation of {@link TextIndex}
    + *
    + */
    +public class TextIndexES implements TextIndex {
    +
    +    /**
    +     * The definition of the Entity we are trying to Index
    +     */
    +    private final EntityDefinition docDef ;
    +
    +    /**
    +     * Thread safe ElasticSearch Java Client to perform Index operations
    +     */
    +    private static Client client;
    +
    +    /**
    +     * The name of the index. Defaults to 'test'
    +     */
    +    private final String indexName;
    +
    +    static final String CLUSTER_NAME_PARAM = "cluster.name";
    +
    +    static final String NUM_OF_SHARDS_PARAM = "number_of_shards";
    +
    +    static final String NUM_OF_REPLICAS_PARAM = "number_of_replicas";
    +
    +    /**
    +     * Number of maximum results to return in case no limit is specified 
on the search operation
    +     */
    +    static final Integer MAX_RESULTS = 10000;
    +
    +    private boolean isMultilingual ;
    --- End diff --
    
    I don't think that the distinction between multilingual and 
non-multilingual mode for the ES backend makes sense. You are using a 
document-per-entity model that requires you to track also language tags (using 
separate fields), and for that to work properly, what you currently call 
multilingual mode is necessary (otherwise, you cannot handle the "remove one of 
the Berlin labels but not all of them" case). Also, this is still different 
from the multilingual mode in the Lucene backend, which also uses 
language-specific analyzers.
    
    I suggest that you remove the `isMultilingual` attribute and change the 
code so that it always works as if it were true.


> Elastic Search Support for Apache Jena Text 
> --------------------------------------------
>
>                 Key: JENA-1305
>                 URL: https://issues.apache.org/jira/browse/JENA-1305
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: Text
>    Affects Versions: Jena 3.2.0
>            Reporter: Anuj Kumar
>            Assignee: Osma Suominen
>              Labels: elasticsearch
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in 
> ElasticSearch. This implementation would be similar to the Lucene and Solr 
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (JENA-1305) Elastic Search Support for Apache Jena Text

Reply via email to