Can anyone tell me why the penultimate slide in this file is messed up? I am totally confused. You may also need the prosper layout file from

http://www.md.kth.se/~chr/lyx/examples/prosper/layout_by_Weiss/prosper-by-Weiss.layout

Cheers,
Raphael
#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass prosper-by-Weiss
\options pdf
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\spacing single 
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default

\layout Title

Distributed and Paged Suffix Trees for Large Genetic Databases
\layout Author

Raphael Clifford and Marek Sergot
\layout Institution

Imperial College, London, UK
\layout Slide

Overview
\layout Itemize

Why do we need distributed and paged suffix trees?
\layout Itemize

What do we mean by a distributed or paged suffix tree?
\layout Itemize

How can we construct one efficiently?
\layout Itemize

What can we do with it?
\layout Slide

Motivation
\layout Standard

Why do we need distributed and paged suffix trees?
\layout Itemize

Suffix trees use a lot of memory.
\layout Itemize

Their construction and querying algorithms have very poor memory locality
 so secondary storage can on not be used.
\layout Itemize

Sequencing projects are producing more and more data.
\layout Itemize

Smarter bioinformatics algorithms are being developed which require more
 complex data structures such as the suffix tree.
\layout Slide

Sparse suffix trees
\layout Standard

A sparse suffix tree (SST) of a string 
\begin_inset Formula $t$
\end_inset 

 is the compacted trie of a subset of the suffixes of 
\begin_inset Formula $t$
\end_inset 

.
\layout Itemize

Evenly spaced suffixes handled by Karkkainen and Ukkonen in 1996.
 
\layout Itemize

Andersson gave a quite different construction algorithm for 
\begin_inset Quotes eld
\end_inset 

suffix trees on words
\begin_inset Quotes erd
\end_inset 

.
 The suffixes that are to be inserted into the tree are defined by delimiters
 that define the end of 
\begin_inset Quotes eld
\end_inset 

words
\begin_inset Quotes erd
\end_inset 

 in the string.
 
\layout Itemize

Andersson's construction algorithm is complicated but is able to cover the
 case of evenly spaced suffixes as well as some others.
 However it is unnecessary as we shall see.
\layout Slide

Sparse suffix trees for distributed computing
\layout Itemize

We distribute the suffix tree by splitting the text 
\emph on 
lexicographically.
\layout Itemize

Define the suffixes that are to inserted by referring to a set of 
\emph on 
valid 
\emph default 
suffixes, 
\begin_inset Formula $V_{z}$
\end_inset 

, where 
\begin_inset Formula $z$
\end_inset 

 is a short fixed substing of the text.
 
\begin_inset Formula $V_{z}$
\end_inset 

 contains the start positions of each suffix that has 
\begin_inset Formula $z$
\end_inset 

 as a prefix.
 We say that a word in string 
\begin_inset Formula $t$
\end_inset 

 is 
\emph on 
nodal 
\emph default 
wrt to a SST if it corresponds to a node in that SST.
 
\layout Itemize

Suffix links will in general point across SSTs defined in this way.
 To perform the construction efficiently we must redefine them.
\layout Slide

Sparse suffix links
\layout Standard

We require that sparse suffix links must point to nodes within the same
 SST and that they are powerful enough to enable linear time construction.
\layout Standard


\series bold 
Definition:
\layout Standard

Consider an input pair 
\begin_inset Formula $(t,V_{z})$
\end_inset 

 and sparse suffix tree 
\begin_inset Formula $T=sst(t,V_{z})$
\end_inset 

.
 Let 
\begin_inset Formula $aw$
\end_inset 

 be nodal in 
\begin_inset Formula $T$
\end_inset 

 and 
\begin_inset Formula $v$
\end_inset 

 be the longest repeated suffix of 
\begin_inset Formula $aw$
\end_inset 

 that occurs in 
\begin_inset Formula $T$
\end_inset 

.
 A 
\emph on 
sparse
\emph default 
 
\emph on 
suffix
\emph default 
 
\emph on 
link
\emph default 
 or 
\emph on 
ssl
\emph default 
 is an unlabelled edge from 
\begin_inset Formula $\overline{aw}$
\end_inset 

 to the
\emph on 
 root
\emph default 
 if 
\begin_inset Formula $|v|<|z|$
\end_inset 

 and from 
\begin_inset Formula $\overline{aw}$
\end_inset 

 to 
\begin_inset Formula $\overline{v}$
\end_inset 

, otherwise.
\layout Slide

Normal suffix tree
\layout Standard
\align center 

\begin_inset Graphics
        filename link.eps
        scale 50
        subcaptionText "The standard suffix tree of $aacacccacacaccacaaa\$$ with 
standard suffix links.."

\end_inset 


\layout Standard

The standard suffix tree of 
\begin_inset Formula $aacacccacacaccacaaa\$$
\end_inset 

 with standard suffix links.
\layout Slide

Merged SSTs
\layout Standard
\align center 

\begin_inset Graphics
        filename link.eps
        scale 50

\end_inset 


\layout Standard

SSTs for 
\begin_inset Formula $aacacccacacaccacaaa\$$
\end_inset 

 with their respective root nodes.
 The sparse suffix links are marked with dashed arrows.
 
\layout Slide

SST construction
\layout Standard

In order to construct the SST online we need to be sure which new suffixes
 might need to be inserted at each turn.
\layout Standard


\series bold 
Definition:
\layout Standard

Consider input pair 
\begin_inset Formula $(p,V_{z})$
\end_inset 

.
 Denote the set valid repeated suffixes of 
\begin_inset Formula $p$
\end_inset 

 by 
\begin_inset Formula $R(p,V_{z})$
\end_inset 

.
 We define this set to include the empty string, 
\begin_inset Formula $\epsilon$
\end_inset 

.
 Let 
\begin_inset Formula $\alpha(p)$
\end_inset 

 be the longest suffix in 
\begin_inset Formula $R(p,V_{z})$
\end_inset 

.
\layout Standard


\series bold 
Example
\series default 
:
\layout Standard

Consider input string 
\begin_inset Formula $t=aabaaa$
\end_inset 

 with prefix 
\begin_inset Formula $p=aabaa$
\end_inset 

 and 
\begin_inset Formula $V_{aa}=\{1,4,5\}$
\end_inset 

.
 
\begin_inset Formula $\alpha(p)=aa$
\end_inset 

 and 
\begin_inset Formula $R(p,V_{aa})=\{ aa,\epsilon\}$
\end_inset 

.
 
\layout Theorem

Consider the input pair 
\begin_inset Formula $(t,V_{z})$
\end_inset 

 with 
\begin_inset Formula $p$
\end_inset 

, a proper prefix of 
\begin_inset Formula $t$
\end_inset 

.
 Let 
\begin_inset Formula $a$
\end_inset 

 be the character in 
\begin_inset Formula $t$
\end_inset 

 that directly follows the prefix 
\begin_inset Formula $p$
\end_inset 

.
 Consider also the set, 
\begin_inset Formula $S$
\end_inset 

, of valid suffixes, 
\begin_inset Formula $sa$
\end_inset 

, for 
\begin_inset Formula $I[1,|pa|]$
\end_inset 

 such that 
\begin_inset Formula $|\alpha(p)\geq|sa|>|alpha(pa)|$
\end_inset 

 and 
\begin_inset Formula $s\in R(p,V_{z})$
\end_inset 

.
 
\begin_inset Formula $sst(pa,V_{z})=sst(p,V_{z})$
\end_inset 


\emph on 
augmented
\emph default 
 
\emph on 
by
\emph default 
 
\begin_inset Formula $S$
\end_inset 

.
\layout Standard


\series bold 
Proof
\series default 
:
\layout Standard

Let 
\begin_inset Formula $sa$
\end_inset 

 be a valid suffix for 
\begin_inset Formula $I[1,|pa|]$
\end_inset 

.
 We need to insert this new suffix into the tree if and only if 
\begin_inset Formula $s\in R(p,V_{z})$
\end_inset 

 but 
\begin_inset Formula $sa\notin R(pa,V_{z})$
\end_inset 

 (
\begin_inset Formula $s=\epsilon$
\end_inset 

 is a special case).
 In this case 
\begin_inset Formula $sa$
\end_inset 

 corresponds to a leaf in 
\begin_inset Formula $sst(pa,V_{z})$
\end_inset 

 but 
\begin_inset Formula $s$
\end_inset 

 does not correspond to a leaf in 
\begin_inset Formula $sst(p,V_{z})$
\end_inset 

 so a new node is necessary.
\layout Slide

slide
\layout Theorem

For a input pair 
\begin_inset Formula $(t,V_{z})$
\end_inset 

 with 
\begin_inset Formula $|z|\geq1$
\end_inset 

, 
\begin_inset Formula $sst(t,V_{z})$
\end_inset 

 can be constructed in 
\begin_inset Formula $O(n)$
\end_inset 

 time, where 
\begin_inset Formula $n=|t|$
\end_inset 

.
\layout Standard

The algorithm in this work can construct both types of sparse suffix tree
 plus one mor
\layout Standard

e important variant using a single  
\the_end

Reply via email to