Re: Timestamps and Cardinality in Queries

Aaron D. Mihalik Wed, 01 Mar 2017 10:56:38 -0800

transversing: gotcha.  I completely understand now.  And now I understand
how the prospector table would help with sniping out those nodes.


maven: yep, that's the right git repo.  Locationtech is required when you
build with the 'geoindexing' profile.  Regardless, it's strange that maven
tried to get the apache pom from locationtech.  Deleting the
org/apache/apache directory should force maven to download the apache pom
from maven central.

--Aaron

On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <eric....@capitalone.com> wrote:

> Oh, that’s not an issue, that’s what we would like to do when traversing
> through the data. If a node has a high cardinality we don’t want to further
> traverse through its children.
>
> As for installation, did I clone the right repo for Rya? The one I’m using
> has locationtech repos for SNAPSHOT and RELEASE:
> https://github.com/apache/incubator-rya/blob/master/pom.xml
>
> On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <aaron.miha...@gmail.com> wrote:
>
>     Repos: The locationtech repo is up [1].  The issue is that your local
> .m2
>     repo is in a bad state.  Maven is trying to get the apache pom from
>     locationtech.  Locationtech does not host that pom, instead it's on
> maven
>     central [2].
>
>     Two ways to fix this issue (you should do (1) and that'll fix it...
> (2) is
>     just another option for reference).
>
>     1. Delete your apache pom directory from your local maven repo (e.g.
> rm -rf
>     ~/.m2/repository/org/apache/apache/)
>
>     2. Tell maven to ignore remote repository metadata with the -llr flag
> (e.g.
>     mvn clean install -llr -Pgeoindexing)
>
>     Let me know if you have any other issues.
>
>     deep/wide: okay, I don't understand this statement: "if the
> cardinality of
>     a node is too high (for example, a user that owns a large number of
>     datasets), the neighbors of that node will not be found."  Is this a
>     property of your current datstore, or is this an issue with Rya?
>
>     --Aaron
>
>     [1]
>
> https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
>     [2] http://repo1.maven.org/maven2/org/apache/apache/17/
>
>     On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <puja...@gmail.com> wrote:
>
>     > Hey Eric,
>     > Regarding the repos-- sometimes the location tech repos go down,
> your best
>     > bet is to wait a little bit and try again.  You can also download the
>     > latest artifacts off of the apache build server.
>     > Since location tech is only used for the geo profile we may want to
> move
>     > where that repo is declared (or put it in the geo profile).
>     > For your use case, you could look to use the cardinality in the
> prospector
>     > services for individual nodes.  Though the prospector services could
> be run
>     > once and then used to be representative (that wouldn't work for your
> use
>     > case), you could run them regularly to keep track of counts for your
> use
>     > case.  Are you using the count keyword or just manually counting
> edges?
>     > The count keyword is pretty inefficient currently.  We could add
> that to
>     > our list of priorities maybe.
>     >
>     > Sent from my iPhone
>     >
>     > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <eric....@capitalone.com>
> wrote:
>     > >
>     > > Hey Aaron,
>     > >
>     > > I’m currently setting up Rya to test these queries with some of our
>     > data. I run into an error when I run ‘mvn clean install’, I attached
> the
>     > logs but it seems like I can’t connect to the snapshots repo you’re
> using.
>     > >
>     > > As for “deep/wide”, it would be something like starting at a
> dataset,
>     > then fanning out looking for relations where it is either the
> subject or
>     > object, such as the user who created it, the job it came from, where
> it’s
>     > stored, etc. It would recurse on these neighboring nodes until a
> total
>     > number of results is reached. However, if the cardinality of a node
> is too
>     > high (for example, a user that owns a large number of datasets), the
>     > neighbors of that node will not be found. Really, the goal is to
> find the
>     > most distance relevant relationships possible, and this is our
> current
>     > naïve way of doing so.
>     > >
>     > > Do you want to have a short call about this? I think it’d be
> easier to
>     > explain/answer questions over the phone. I’m free pretty much any
> time
>     > 1pm-5pm PST tomorrow (3/1).
>     > >
>     > > Thanks,
>     > > Eric
>     > >
>     > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aaron.miha...@gmail.com>
> wrote:
>     > >
>     > >    deep vs wide: I played around with the property paths sparql
> operator
>     > and
>     > >    put up an example here [1].  This is a slightly different query
> than
>     > the
>     > >    one I sent out before.  It would be worth it for us to look at
> how
>     > this is
>     > >    actually executed by OpenRDF.
>     > >
>     > >    Eric: Could you clarify by "deep vs wide"?  I think I
> understand your
>     > >    queries, but I don't have a good intuition about those terms
> and how
>     > >    cardinality might figure into a query.  It would probably be a
> bit
>     > more
>     > >    helpful if you provided a model or general description that is
>     > (somewhat)
>     > >    representative of your data.
>     > >
>     > >    --Aaron
>     > >
>     > >    [1]
>     > >
>     >
> https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
>     > >
>     > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <
> ad...@usna.edu>
>     > wrote:
>     > >>
>     > >> Hi Eric,
>     > >>
>     > >> If you want to query by the Accumulo timestamp, something like
>     > >> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I
> did not
>     > try
>     > >> it lately, but timeRange() was in Rya originally. Not sure if it
> was
>     > >> removed in later iterations or whether it would be useful for
> your use
>     > >> case. First Rya paper
>     > >> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
>     > discusses
>     > >> time ranges (Section 5.3 at the link above)
>     > >>
>     > >> Adina
>     > >>
>     > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <puja...@gmail.com
> >
>     > wrote:
>     > >>>
>     > >>> Hey John,
>     > >>> I'm pretty sure your pull request was merged-- it was pulled in
> through
>     > >>> another pull request.  If not, sorry-- I thought it had been
> merged and
>     > >>> then just not closed.  I was going to spend some time doing
> merges
>     > >> tomorrow
>     > >>> so I can get it tomorrow.
>     > >>>
>     > >>> Sent from my iPhone
>     > >>>
>     > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <johns0...@gmail.com>
> wrote:
>     > >>>>
>     > >>>> I have a pull request that fixes that problem.. it has been
> stuck in
>     > >>> limbo
>     > >>>> for months..
> https://github.com/apache/incubator-rya-site/pull/1  Can
>     > >>>> someone merge it into master?
>     > >>>>
>     > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <
> eric....@capitalone.com>
>     > >>> wrote:
>     > >>>>>
>     > >>>>> Cool, thanks for the help.
>     > >>>>> By the way, the link to the Rya Manual is outdated on the
>     > >>> rya.apache.org
>     > >>>>> site. Should be pointing at https://github.com/apache/
>     > >>>>> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
>     > >> index.md
>     > >>>>>
>     > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <
> aaron.miha...@gmail.com>
>     > >>> wrote:
>     > >>>>>
>     > >>>>>   deep vs wide:
>     > >>>>>
>     > >>>>>   A property path query is probably your best bet.  Something
> like:
>     > >>>>>
>     > >>>>>   for the following data:
>     > >>>>>
>     > >>>>>   s:EventA p:causes s:EventB
>     > >>>>>   s:EventB p:causes s:EventC
>     > >>>>>   s:EventC p:causes s:EventD
>     > >>>>>
>     > >>>>>
>     > >>>>>   This query would start at EventB and work it's way up and
> down the
>     > >>>>> chain:
>     > >>>>>
>     > >>>>>   SELECT * WHERE {
>     > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
>     > >>>>>   }
>     > >>>>>
>     > >>>>>
>     > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
>     > >>> caleb.me...@parsons.com>
>     > >>>>>   wrote:
>     > >>>>>
>     > >>>>>> Yes, that's a good place to start.  If you have external
> timestamps
>     > >>>>> that
>     > >>>>>> are built into your graph using the time ontology in owl (e.g
> you
>     > >>>>> have
>     > >>>>>> triples of the form (event123, time:inDateTime,
> 2017-02-23T14:29)),
>     > >>>>> the
>     > >>>>>> temporal index is exactly what you want.  If you are hoping
> to query
>     > >>>>> based
>     > >>>>>> on the internal timestamps that Accumulo assigns to your
> triples,
>     > >>>>> then
>     > >>>>>> there are some slight tweaks that can be done to facilitate
> this,
>     > >>>>> but it
>     > >>>>>> won't be nearly as efficient (this will require some sort of
> client
>     > >>>>> side
>     > >>>>>> filtering).
>     > >>>>>>
>     > >>>>>> Caleb A. Meier, Ph.D.
>     > >>>>>> Software Engineer II ♦ Analyst
>     > >>>>>> Parsons Corporation
>     > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>     > >>>>>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
> <(703)%20797-3066>
>     > <(703)%20797-3066>
>     > >>>>>> caleb.me...@parsons.com ♦ www.parsons.com
>     > >>>>>>
>     > >>>>>> -----Original Message-----
>     > >>>>>> From: Liu, Eric [mailto:eric....@capitalone.com]
>     > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
>     > >>>>>> To: dev@rya.incubator.apache.org
>     > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
>     > >>>>>>
>     > >>>>>> We’d like to be able to query by timestamp; specifically, we
> want to
>     > >>>>> be
>     > >>>>>> able to find all statements that were made within a given time
>     > >>>>> range. Is
>     > >>>>>> this what I should be looking at?
>     > >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
>     > >>>>> apache.org_confluence_download_attachments_63407907_
>     > >>>>>
> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
>     > >>>>> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
>     > >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
>     > >>> geo_4WXTD0qo8&m=
>     > >>>>> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
>     > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <caleb.me...@parsons.com>
>     > wrote:
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Hey Eric,
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Currently timestamps can't be queried in Rya.  Do you need
> to be
>     > >>>>> able
>     > >>>>>> to query by timestamp, or simply discover the timestamp for a
> given
>     > >>>>> node?
>     > >>>>>> Rya does have a temporal index, but that requires you to use a
>     > >>>>> temporal
>     > >>>>>> ontology to model the temporal properties of your graph nodes.
>     > >>>>>>
>     > >>>>>>   ________________________________________
>     > >>>>>>
>     > >>>>>>   From: Liu, Eric <eric....@capitalone.com>
>     > >>>>>>
>     > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
>     > >>>>>>
>     > >>>>>>   To: dev@rya.incubator.apache.org
>     > >>>>>>
>     > >>>>>>   Subject: Timestamps and Cardinality in Queries
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Hi,
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Continuing from our talk earlier today I was wondering if
> you
>     > >>>>> could
>     > >>>>>> provide more information about how timestamps could be
> queried in
>     > >>>>> Rya.
>     > >>>>>>
>     > >>>>>>   Also, we are trying to support a type of query that would
>     > >>>>> essentially
>     > >>>>>> be limiting on cardinality (different from the normal SPARQL
> limit
>     > >>>>> because
>     > >>>>>> it’s for node cardinality rather than total results). I saw
> in one
>     > of
>     > >>>>>> Caleb’s talks that Rya’s query optimization involves checking
>     > >>>>> cardinality
>     > >>>>>> first. I was wondering if there would be some way to tap into
> this
>     > >>>>> feature
>     > >>>>>> for usage in queries?
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Thanks,
>     > >>>>>>
>     > >>>>>>   Eric Liu
>     > >>>>>>
>     > >>>>>>   ________________________________________________________
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   The information contained in this e-mail is confidential
> and/or
>     > >>>>>> proprietary to Capital One and/or its affiliates and may only
> be
>     > used
>     > >>>>>> solely in performance of work or services for Capital One. The
>     > >>>>> information
>     > >>>>>> transmitted herewith is intended only for use by the
> individual or
>     > >>>>> entity
>     > >>>>>> to which it is addressed. If the reader of this message is
> not the
>     > >>>>> intended
>     > >>>>>> recipient, you are hereby notified that any review,
> retransmission,
>     > >>>>>> dissemination, distribution, copying or other use of, or
> taking of
>     > >>>>> any
>     > >>>>>> action in reliance upon this information is strictly
> prohibited. If
>     > >>>>> you
>     > >>>>>> have received this communication in error, please contact the
> sender
>     > >>>>> and
>     > >>>>>> delete the material from your computer.
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>> ________________________________________________________
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>> The information contained in this e-mail is confidential
> and/or
>     > >>>>>> proprietary to Capital One and/or its affiliates and may only
> be
>     > used
>     > >>>>>> solely in performance of work or services for Capital One. The
>     > >>>>> information
>     > >>>>>> transmitted herewith is intended only for use by the
> individual or
>     > >>>>> entity
>     > >>>>>> to which it is addressed. If the reader of this message is
> not the
>     > >>>>> intended
>     > >>>>>> recipient, you are hereby notified that any review,
> retransmission,
>     > >>>>>> dissemination, distribution, copying or other use of, or
> taking of
>     > >>>>> any
>     > >>>>>> action in reliance upon this information is strictly
> prohibited. If
>     > >>>>> you
>     > >>>>>> have received this communication in error, please contact the
> sender
>     > >>>>> and
>     > >>>>>> delete the material from your computer.
>     > >>>>>>
>     > >>>>>
>     > >>>>>
>     > >>>>> ________________________________________________________
>     > >>>>>
>     > >>>>> The information contained in this e-mail is confidential and/or
>     > >>>>> proprietary to Capital One and/or its affiliates and may only
> be used
>     > >>>>> solely in performance of work or services for Capital One. The
>     > >>> information
>     > >>>>> transmitted herewith is intended only for use by the
> individual or
>     > >>> entity
>     > >>>>> to which it is addressed. If the reader of this message is not
> the
>     > >>> intended
>     > >>>>> recipient, you are hereby notified that any review,
> retransmission,
>     > >>>>> dissemination, distribution, copying or other use of, or
> taking of
>     > any
>     > >>>>> action in reliance upon this information is strictly
> prohibited. If
>     > >> you
>     > >>>>> have received this communication in error, please contact the
> sender
>     > >> and
>     > >>>>> delete the material from your computer.
>     > >>>>>
>     > >>>
>     > >>
>     > >>
>     > >>
>     > >> --
>     > >> Dr. Adina Crainiceanu
>     > >> Associate Professor, Computer Science Department
>     > >> United States Naval Academy
>     > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
> <(410)%20293-6822>
>     > >> ad...@usna.edu
>     > >> http://www.usna.edu/Users/cs/adina/
>     > >>
>     > >
>     > >
>     > > ________________________________________________________
>     > >
>     > > The information contained in this e-mail is confidential and/or
>     > proprietary to Capital One and/or its affiliates and may only be used
>     > solely in performance of work or services for Capital One. The
> information
>     > transmitted herewith is intended only for use by the individual or
> entity
>     > to which it is addressed. If the reader of this message is not the
> intended
>     > recipient, you are hereby notified that any review, retransmission,
>     > dissemination, distribution, copying or other use of, or taking of
> any
>     > action in reliance upon this information is strictly prohibited. If
> you
>     > have received this communication in error, please contact the sender
> and
>     > delete the material from your computer.
>     > > <log.txt>
>     >
>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Timestamps and Cardinality in Queries

Reply via email to