transversing: gotcha. I completely understand now. And now I understand how the prospector table would help with sniping out those nodes.
maven: yep, that's the right git repo. Locationtech is required when you build with the 'geoindexing' profile. Regardless, it's strange that maven tried to get the apache pom from locationtech. Deleting the org/apache/apache directory should force maven to download the apache pom from maven central. --Aaron On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <eric....@capitalone.com> wrote: > Oh, that’s not an issue, that’s what we would like to do when traversing > through the data. If a node has a high cardinality we don’t want to further > traverse through its children. > > As for installation, did I clone the right repo for Rya? The one I’m using > has locationtech repos for SNAPSHOT and RELEASE: > https://github.com/apache/incubator-rya/blob/master/pom.xml > > On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <aaron.miha...@gmail.com> wrote: > > Repos: The locationtech repo is up [1]. The issue is that your local > .m2 > repo is in a bad state. Maven is trying to get the apache pom from > locationtech. Locationtech does not host that pom, instead it's on > maven > central [2]. > > Two ways to fix this issue (you should do (1) and that'll fix it... > (2) is > just another option for reference). > > 1. Delete your apache pom directory from your local maven repo (e.g. > rm -rf > ~/.m2/repository/org/apache/apache/) > > 2. Tell maven to ignore remote repository metadata with the -llr flag > (e.g. > mvn clean install -llr -Pgeoindexing) > > Let me know if you have any other issues. > > deep/wide: okay, I don't understand this statement: "if the > cardinality of > a node is too high (for example, a user that owns a large number of > datasets), the neighbors of that node will not be found." Is this a > property of your current datstore, or is this an issue with Rya? > > --Aaron > > [1] > > https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/ > [2] http://repo1.maven.org/maven2/org/apache/apache/17/ > > On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <puja...@gmail.com> wrote: > > > Hey Eric, > > Regarding the repos-- sometimes the location tech repos go down, > your best > > bet is to wait a little bit and try again. You can also download the > > latest artifacts off of the apache build server. > > Since location tech is only used for the geo profile we may want to > move > > where that repo is declared (or put it in the geo profile). > > For your use case, you could look to use the cardinality in the > prospector > > services for individual nodes. Though the prospector services could > be run > > once and then used to be representative (that wouldn't work for your > use > > case), you could run them regularly to keep track of counts for your > use > > case. Are you using the count keyword or just manually counting > edges? > > The count keyword is pretty inefficient currently. We could add > that to > > our list of priorities maybe. > > > > Sent from my iPhone > > > > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <eric....@capitalone.com> > wrote: > > > > > > Hey Aaron, > > > > > > I’m currently setting up Rya to test these queries with some of our > > data. I run into an error when I run ‘mvn clean install’, I attached > the > > logs but it seems like I can’t connect to the snapshots repo you’re > using. > > > > > > As for “deep/wide”, it would be something like starting at a > dataset, > > then fanning out looking for relations where it is either the > subject or > > object, such as the user who created it, the job it came from, where > it’s > > stored, etc. It would recurse on these neighboring nodes until a > total > > number of results is reached. However, if the cardinality of a node > is too > > high (for example, a user that owns a large number of datasets), the > > neighbors of that node will not be found. Really, the goal is to > find the > > most distance relevant relationships possible, and this is our > current > > naïve way of doing so. > > > > > > Do you want to have a short call about this? I think it’d be > easier to > > explain/answer questions over the phone. I’m free pretty much any > time > > 1pm-5pm PST tomorrow (3/1). > > > > > > Thanks, > > > Eric > > > > > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aaron.miha...@gmail.com> > wrote: > > > > > > deep vs wide: I played around with the property paths sparql > operator > > and > > > put up an example here [1]. This is a slightly different query > than > > the > > > one I sent out before. It would be worth it for us to look at > how > > this is > > > actually executed by OpenRDF. > > > > > > Eric: Could you clarify by "deep vs wide"? I think I > understand your > > > queries, but I don't have a good intuition about those terms > and how > > > cardinality might figure into a query. It would probably be a > bit > > more > > > helpful if you provided a model or general description that is > > (somewhat) > > > representative of your data. > > > > > > --Aaron > > > > > > [1] > > > > > > https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java > > > > > >> On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu < > ad...@usna.edu> > > wrote: > > >> > > >> Hi Eric, > > >> > > >> If you want to query by the Accumulo timestamp, something like > > >> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I > did not > > try > > >> it lately, but timeRange() was in Rya originally. Not sure if it > was > > >> removed in later iterations or whether it would be useful for > your use > > >> case. First Rya paper > > >> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf > > discusses > > >> time ranges (Section 5.3 at the link above) > > >> > > >> Adina > > >> > > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <puja...@gmail.com > > > > wrote: > > >>> > > >>> Hey John, > > >>> I'm pretty sure your pull request was merged-- it was pulled in > through > > >>> another pull request. If not, sorry-- I thought it had been > merged and > > >>> then just not closed. I was going to spend some time doing > merges > > >> tomorrow > > >>> so I can get it tomorrow. > > >>> > > >>> Sent from my iPhone > > >>> > > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <johns0...@gmail.com> > wrote: > > >>>> > > >>>> I have a pull request that fixes that problem.. it has been > stuck in > > >>> limbo > > >>>> for months.. > https://github.com/apache/incubator-rya-site/pull/1 Can > > >>>> someone merge it into master? > > >>>> > > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric < > eric....@capitalone.com> > > >>> wrote: > > >>>>> > > >>>>> Cool, thanks for the help. > > >>>>> By the way, the link to the Rya Manual is outdated on the > > >>> rya.apache.org > > >>>>> site. Should be pointing at https://github.com/apache/ > > >>>>> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_ > > >> index.md > > >>>>> > > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" < > aaron.miha...@gmail.com> > > >>> wrote: > > >>>>> > > >>>>> deep vs wide: > > >>>>> > > >>>>> A property path query is probably your best bet. Something > like: > > >>>>> > > >>>>> for the following data: > > >>>>> > > >>>>> s:EventA p:causes s:EventB > > >>>>> s:EventB p:causes s:EventC > > >>>>> s:EventC p:causes s:EventD > > >>>>> > > >>>>> > > >>>>> This query would start at EventB and work it's way up and > down the > > >>>>> chain: > > >>>>> > > >>>>> SELECT * WHERE { > > >>>>> <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o > > >>>>> } > > >>>>> > > >>>>> > > >>>>> On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb < > > >>> caleb.me...@parsons.com> > > >>>>> wrote: > > >>>>> > > >>>>>> Yes, that's a good place to start. If you have external > timestamps > > >>>>> that > > >>>>>> are built into your graph using the time ontology in owl (e.g > you > > >>>>> have > > >>>>>> triples of the form (event123, time:inDateTime, > 2017-02-23T14:29)), > > >>>>> the > > >>>>>> temporal index is exactly what you want. If you are hoping > to query > > >>>>> based > > >>>>>> on the internal timestamps that Accumulo assigns to your > triples, > > >>>>> then > > >>>>>> there are some slight tweaks that can be done to facilitate > this, > > >>>>> but it > > >>>>>> won't be nearly as efficient (this will require some sort of > client > > >>>>> side > > >>>>>> filtering). > > >>>>>> > > >>>>>> Caleb A. Meier, Ph.D. > > >>>>>> Software Engineer II ♦ Analyst > > >>>>>> Parsons Corporation > > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209 > > >>>>>> Office: (703)797-3066 <(703)%20797-3066> <(703)%20797-3066> > <(703)%20797-3066> > > <(703)%20797-3066> > > >>>>>> caleb.me...@parsons.com ♦ www.parsons.com > > >>>>>> > > >>>>>> -----Original Message----- > > >>>>>> From: Liu, Eric [mailto:eric....@capitalone.com] > > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM > > >>>>>> To: dev@rya.incubator.apache.org > > >>>>>> Subject: Re: Timestamps and Cardinality in Queries > > >>>>>> > > >>>>>> We’d like to be able to query by timestamp; specifically, we > want to > > >>>>> be > > >>>>>> able to find all statements that were made within a given time > > >>>>> range. Is > > >>>>>> this what I should be looking at? > > >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki. > > >>>>> apache.org_confluence_download_attachments_63407907_ > > >>>>> > Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate- > > >>>>> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_ > > >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC > > >>> geo_4WXTD0qo8&m= > > >>>>> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE- > > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e= > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <caleb.me...@parsons.com> > > wrote: > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Hey Eric, > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Currently timestamps can't be queried in Rya. Do you need > to be > > >>>>> able > > >>>>>> to query by timestamp, or simply discover the timestamp for a > given > > >>>>> node? > > >>>>>> Rya does have a temporal index, but that requires you to use a > > >>>>> temporal > > >>>>>> ontology to model the temporal properties of your graph nodes. > > >>>>>> > > >>>>>> ________________________________________ > > >>>>>> > > >>>>>> From: Liu, Eric <eric....@capitalone.com> > > >>>>>> > > >>>>>> Sent: Wednesday, February 22, 2017 6:38 PM > > >>>>>> > > >>>>>> To: dev@rya.incubator.apache.org > > >>>>>> > > >>>>>> Subject: Timestamps and Cardinality in Queries > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Hi, > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Continuing from our talk earlier today I was wondering if > you > > >>>>> could > > >>>>>> provide more information about how timestamps could be > queried in > > >>>>> Rya. > > >>>>>> > > >>>>>> Also, we are trying to support a type of query that would > > >>>>> essentially > > >>>>>> be limiting on cardinality (different from the normal SPARQL > limit > > >>>>> because > > >>>>>> it’s for node cardinality rather than total results). I saw > in one > > of > > >>>>>> Caleb’s talks that Rya’s query optimization involves checking > > >>>>> cardinality > > >>>>>> first. I was wondering if there would be some way to tap into > this > > >>>>> feature > > >>>>>> for usage in queries? > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>> Eric Liu > > >>>>>> > > >>>>>> ________________________________________________________ > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> The information contained in this e-mail is confidential > and/or > > >>>>>> proprietary to Capital One and/or its affiliates and may only > be > > used > > >>>>>> solely in performance of work or services for Capital One. The > > >>>>> information > > >>>>>> transmitted herewith is intended only for use by the > individual or > > >>>>> entity > > >>>>>> to which it is addressed. If the reader of this message is > not the > > >>>>> intended > > >>>>>> recipient, you are hereby notified that any review, > retransmission, > > >>>>>> dissemination, distribution, copying or other use of, or > taking of > > >>>>> any > > >>>>>> action in reliance upon this information is strictly > prohibited. If > > >>>>> you > > >>>>>> have received this communication in error, please contact the > sender > > >>>>> and > > >>>>>> delete the material from your computer. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> ________________________________________________________ > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> The information contained in this e-mail is confidential > and/or > > >>>>>> proprietary to Capital One and/or its affiliates and may only > be > > used > > >>>>>> solely in performance of work or services for Capital One. The > > >>>>> information > > >>>>>> transmitted herewith is intended only for use by the > individual or > > >>>>> entity > > >>>>>> to which it is addressed. If the reader of this message is > not the > > >>>>> intended > > >>>>>> recipient, you are hereby notified that any review, > retransmission, > > >>>>>> dissemination, distribution, copying or other use of, or > taking of > > >>>>> any > > >>>>>> action in reliance upon this information is strictly > prohibited. If > > >>>>> you > > >>>>>> have received this communication in error, please contact the > sender > > >>>>> and > > >>>>>> delete the material from your computer. > > >>>>>> > > >>>>> > > >>>>> > > >>>>> ________________________________________________________ > > >>>>> > > >>>>> The information contained in this e-mail is confidential and/or > > >>>>> proprietary to Capital One and/or its affiliates and may only > be used > > >>>>> solely in performance of work or services for Capital One. The > > >>> information > > >>>>> transmitted herewith is intended only for use by the > individual or > > >>> entity > > >>>>> to which it is addressed. If the reader of this message is not > the > > >>> intended > > >>>>> recipient, you are hereby notified that any review, > retransmission, > > >>>>> dissemination, distribution, copying or other use of, or > taking of > > any > > >>>>> action in reliance upon this information is strictly > prohibited. If > > >> you > > >>>>> have received this communication in error, please contact the > sender > > >> and > > >>>>> delete the material from your computer. > > >>>>> > > >>> > > >> > > >> > > >> > > >> -- > > >> Dr. Adina Crainiceanu > > >> Associate Professor, Computer Science Department > > >> United States Naval Academy > > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822> > <(410)%20293-6822> > > >> ad...@usna.edu > > >> http://www.usna.edu/Users/cs/adina/ > > >> > > > > > > > > > ________________________________________________________ > > > > > > The information contained in this e-mail is confidential and/or > > proprietary to Capital One and/or its affiliates and may only be used > > solely in performance of work or services for Capital One. The > information > > transmitted herewith is intended only for use by the individual or > entity > > to which it is addressed. If the reader of this message is not the > intended > > recipient, you are hereby notified that any review, retransmission, > > dissemination, distribution, copying or other use of, or taking of > any > > action in reliance upon this information is strictly prohibited. If > you > > have received this communication in error, please contact the sender > and > > delete the material from your computer. > > > <log.txt> > > > > > ________________________________________________________ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >