[ https://issues.apache.org/jira/browse/USERGRID-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Nine updated USERGRID-536: ------------------------------- Summary: Change our index structure for static mapping and cleanup api (was: Change our index structure to eliminate static mapping) > Change our index structure for static mapping and cleanup api > ------------------------------------------------------------- > > Key: USERGRID-536 > URL: https://issues.apache.org/jira/browse/USERGRID-536 > Project: Usergrid > Issue Type: Story > Components: Stack > Reporter: Todd Nine > Assignee: Todd Nine > > Currently, our dynamic mapping causes several issues with elastic search. We > should change our mapping to use a static structure, and resolve this > operational pain. > We need to make the following changes. > h2. Modify our IndexScope > This should more closely resemble the elements of an edge since this > represents an edge. It will simplify the use of our query module and make > development clearer. This scope should be refactored into the following > objects. > * IndexEdge - Id, name, timestamp, edgeType (source or target) > * SearchEdge - Id, name, edgeType > Note: edgeType is the type of the Id within the edge. Does this Id represent > a source Id, or does it represent a targetId? The entity to be indexed will > implicitly be the opposite of the type specified. I.E if it's a source edge, > the document is the target. If it's a target edge, the document is the > source. > These values should also be stored within our document, so that we can index > our documents. Note that we perform bidirectional indexing in some cases, > such was users, groups etc. When we do this, we need to ensure that mark the > direction of the edge appropriately. > h2. Change default sort ordering > When sorting is unspecified, we should order by timestamp descending from our > index edge. This ensures that we retain the correct edge time semantics, and > will properly order collections and connections > h2. Remove the legacy query class > We don't need the Query class, it has far too many functions to be a well > encapsulated object. Instead, we should simply take the string QL, the > SearchEdge and the limit to return our candidates. From there, we should > parse and visit the query internally to the query logic, NOT externally. > h2. Create a static mapping > The mapping should contains the following static fields. > * entityId - The entity id > * entityType - The entity type (from the id) > * entityVersion - The entity version > * edgeId - The edge Id > * edgeName - The edge name > * edgeTimestamp - The edge timestamp > * edgeType - source | target > * searchEdge - edgeId + edgeName + edgeType > It will then contain an array of "fields" Each of these fields will have the > following formation. > {code} > { "name":"[entity field name as a path]", "[field type]":[field value} > {code} > We will define a field type for each type of field. Note that each field > tuple will always contain a single field and a single value. Possible field > types are the following. > * string - This will be mapped into 2 mapping with multi mappings. It will > be a string unanalyzed, and an analyzed string. The 2 fields will then be > "string_u" and "string_a". The Query visitor will need to update the field > name appropriately > * long - An unanalyzed long > * double - An unanalyzed double > * boolean - An unanalyzed boolean > The entity path will be a flattened path from the root json element to the > max json element. It can be though of as a path through the tree of json > elements. We will use a dot '.' to delimit the fields. X.Y.Z for nested > objects. Primitive arrays will contain a field object for each element in > the array. > h2. Indexing > When indexing entities, we will no longer modify or prefix field names. > They will be inserted into the value exactly as their path appears after > lower case. > h2. Querying > When querying, the "contains" operation for a string will need to use the > "string_a" data type. When using =, we will need to use the string_u data > type. Each criteria will need to use nested object querying, to ensure the > property name and property value are both part of the same field tuple. > h3. References > Multi Field Mapping: > http://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html > Nested Objects: > http://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)