[ https://issues.apache.org/jira/browse/ATLAS-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Mestry updated ATLAS-1818: ----------------------------------- Description: h3. Background An environment that is setup with 100K hive_tables each with 84 columns. The basic search with query parameter specified is executed. Results take 75 secs to appear. h3. Analysis & Findings Similar test was performed with smaller data set (200 hive_tables each with 81 columns) resulted in less than ideal performance. Atlas Basic Search API uses _graph.indexQuery_ for performing search. This uses _Solr_ for doing the search. There are 2 aspects that affect performance: * Solr's default for returning max query set when no limit is specified is 100K. In the test scenario, this is returning entire result set. * Once result set is returned, _EntityDiscoveryService.searchUsingBasicQuery_ does a sequential scan to weed out pertinent data. This operation is proportional to size of the result set. h3. Solution Following changes will improve performance: * Solr's max result set property is governed by _atlas.graph.index.search.max-result-set-size_. It will make sense to set this to a lower number. * Modify Solr's configuration _solrconfig.xml_ to use _FastLRUCache_. * Modify _EntityDiscoveryService.searchUsingBasicQuery_ to form a query that takes additional paramters. was: h3. Background An environment that is setup with 100K hive_tables each with 84 columns. The basic search with query parameter specified is executed. Results take 75 secs to appear. h3. Analysis & Findings Similar test was performed with smaller data set (200 hive_tables each with 81 columns) resulted in less than ideal performance. Atlas Basic Search API uses _graph.indexQuery_ for performing search. This uses _Solr_ for doing the search. There are 2 aspects that affect performance: * Solr's default for returning max query set when no limit is specified is 100K. In the test scenario, this is returning entire result set. * Once result set is returned, _EntityDiscoveryService.searchUsingBasicQuery_ does a sequential scan to weed out pertinent data. This operation is proportional to size of the result set. h3. Solution Following changes will improve performance: ** Solr's max result set property is governed by _atlas.graph.index.search.max-result-set-size_. It will make sense to set this to a lower number. ** Modify Solr's configuration _solrconfig.xml_ to use _FastLRUCache_. ** Modify _EntityDiscoveryService.searchUsingBasicQuery_ to form a query that takes additional paramters. > Performance of Basic Search that Uses indexQuery Takes Long Time to Fetch > Results > --------------------------------------------------------------------------------- > > Key: ATLAS-1818 > URL: https://issues.apache.org/jira/browse/ATLAS-1818 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-webui > Affects Versions: trunk, 0.8-incubating > Reporter: Ashutosh Mestry > Assignee: Ashutosh Mestry > Fix For: trunk, 0.8-incubating > > Attachments: ATLAS-1818.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > h3. Background > An environment that is setup with 100K hive_tables each with 84 columns. > The basic search with query parameter specified is executed. Results take 75 > secs to appear. > h3. Analysis & Findings > Similar test was performed with smaller data set (200 hive_tables each with > 81 columns) resulted in less than ideal performance. > Atlas Basic Search API uses _graph.indexQuery_ for performing search. This > uses _Solr_ for doing the search. > There are 2 aspects that affect performance: > * Solr's default for returning max query set when no limit is specified is > 100K. In the test scenario, this is returning entire result set. > * Once result set is returned, _EntityDiscoveryService.searchUsingBasicQuery_ > does a sequential scan to weed out pertinent data. This operation is > proportional to size of the result set. > h3. Solution > Following changes will improve performance: > * Solr's max result set property is governed by > _atlas.graph.index.search.max-result-set-size_. It will make sense to set > this to a lower number. > * Modify Solr's configuration _solrconfig.xml_ to use _FastLRUCache_. > * Modify _EntityDiscoveryService.searchUsingBasicQuery_ to form a query that > takes additional paramters. -- This message was sent by Atlassian JIRA (v6.3.15#6346)